iterative_reweighting.Rmd
One of the main aims of a multi-species model, such as those implemented using Gadget, is to estimate values of selected unknown parameters. The likelihood function serves as a general measure of how well a model with a given set of parameters fits data and parameter estimation is therefore undertaken by maximizing the likelihood function over values of the unknown parameters.
The form of the likelihood function for a particular model and data set will vary depending on the nature of the data. Since fisheries data come from various sources, a large number of different likelihood functions have been implemented in Gadget. When such different data sources are combined in on analysis, the likelihood function becomes a product of the likelihood function for each data set. The individual pieces are referred to as likelihood components.
As is common practice, maximum likelihood estimation of parameters is implemented in Gadget through minimizing the negative log–likelihood. The negative log–likelihood function will referred to as the objective function. Thus the objective function serves as a measure of the discrepancy between the output of the model and measurements.
As noted in the introduction, several components enter the objective function in any single estimation. Therefore the objective function becomes a weighted sum of several components: \[l = \sum_{i} w_i l_i \] The weights, \(w_i\), are necessary for several reasons. Notably, they can be used to prevent some components from dominating the likelihood function, to reduce the effect of low quality data and as a priori estimates of the variance in each subset of the data.
In this setting the assignment of these weights is, as noted above, generally not trivial, except in the case of a weighted regression. In @taylor2007simple an objective reweighting scheme for likelihood components is described for cod in Icelandic waters. A simple heuristic, where the weights are the inverse of the initial sums of squares for the respective component resulting in an initials score equal to the number of components, is sometimes used. This has the intutitive advantage of all components being normalised. There is however a drawback to this since the component scores, given the initial parametrisation, are most likely not equally far from their respective optima and this in turn results in a sub-optimal weighting.
The iterative reweighting heuristic [described first in @stefansson2003issues which is inspired by the weighted regression case] tackles this problem by optimising each component separately in order to determine the lowest possible value for each component. This is then used to determine the final weights. The reasoning for this approach is as follows:
Conceptually the likelihood components can be thought of as residual sums of squares (SS), and as such their variance can be esimated by dividing the SS by the degrees of freedom. The optimal weighting strategy is then inverse of the variance. Here the iteration starts with assigning the inverse SS as the initial weight, that is the initial score of each component when multiplied with the weight is 1. Then a series of optimisation runs for each component with the intial contribution for that component to the objective function is set to 10000, while other component contribute only 1. After this series of optimisation runs the inverse of the resulting mininum SS is multiplied by the effective number of datapoints and used as the final weight for that particular component.
The effective number of datapoints is used as a proxy for the degrees of freedom is determined from the number of non-zero datapoints. This is viewed as satisfactory proxy when the dataset is large, but for smaller datasets this could be a gross overestimate. In particular, if the survey indices are weighted on their own while the yearly recruitment is esimated they could be overfitted. If there are two surveys within the year @taylor2007simple suggests that the corresponding indices from each survey are weigthed simultaneously in order to make sure that there are at least two measurements for each recruiting yearclass, this is done through component grouping. In general when there is a chance of overfitting, e.g. the model has flexibility to almost perfectly predict the observations it worth while to consider grouping together related datasets.
As an illustrative example consider the model for Icelandic cod. Following the logic of @taylor2007simple the survey indices are grouped together by size category to prevent overfitting, as you will have more than one observation for each recruitment variable. That is done with the grouping
input variable. It essentially takes a list of vectors, where each vector lists the likelihood component names from the likelihood file that should be grouped.
gadget.iterative(main='main',
grouping=list(sind1=c('si.gp1','si.gp1a'),
sind2=c('si.gp2','si.gp2a'),
sind3=c('si.gp3','si.gp3a')),
params.file = 'params.in',
wgts='WGTS')
In the above example Other options to the function include