8 Likelihood Files
The likelihood files are used to define the various likelihood components that are used to calculate the “goodness of fit” of the Gadget model to the available data. Each likelihood component will calculate a likelihood score for that individual component, and there is then a weighted sum of all the likelihood scores to calculate an overall likelihood score. It is this overall likelihood score that the optimiser attempts to minimise during an optimising run.
To define likelihood files in the Gadget model, the “main” file must contain a list of the data files that contain the description of the likelihood classes required, and the format for this is shown below:
[likelihood]
likelihoodfiles <names of the likelihood files>
The likelihood files contain a list of various type of likelihood classes, separated by the keyword [component] that control the different likelihood components in the model, the name and weight for that likelihood component and various likelihood data, depending in the likelihood component type. The format of the likelihood files is follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type <likelihood type>
<likelihood data>
The likelihood data for each likelihood type is covered in the sub sections below. The \(<\)likelihood type\(>\) defines the type of likelihood component that is to be used, and there are currently 12 valid likelihood types defined in Gadget. These are:
BoundLikelihood
Understocking
CatchDistribution
CatchStatistics
StockDistribution
SurveyIndices
SurveyDistribution
StomachContent
Recaptures
RecStatistics
MigrationPenalty
MigrationProportion
CatchInKilos
8.1 BoundLikelihood (“Penalty”)
The BoundLikelihood likelihood component is used to give a penalty weight to parameters that have moved beyond the bounds, as specified in the parameter file, in the optimisation process. This file does not specify the bounds that are to be used, only the penalty that is to be applied when these bounds are exceeded. Since the Simmulated Annealing (see section 11.2) algorithm will always choose a value for the parameter that is within the bounds, this likelihood component will return a zero likelihood score during an optimisation using that algorithm. However, both the Hooke & Jeeves (see section 11.1) and the BFGS (see section 11.3) algorithms can choose a parameter outside the specified bounds, and so this likelihood component can then return a positive score.
To specify a BoundLikelihood likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type penalty
datafile <name for the datafile>
The datafile defines the penalty that is to be applied to the parameter when it exceeds the bounds, as given by equation (8.1) below:
\[\begin{equation} \tag{8.1} \ell_{i} = \begin{cases} lw_{i} (val_{i} - lb_{i})^{p_{i}} & \textrm{if $val_{i} < lb_i$} \\ uw_{i} (val_{i} - ub_{i})^{p_{i}} & \textrm{if $val_{i} > ub_i$} \\ 0 & \textrm{otherwise} \end{cases}\end{equation}\]
where: \(<val_i>\) is the value of the parameter \(<lw_i>\) is the weight applied when the parameter exceeds the lower bound \(<uw_i>\) is the weight applied when the parameter exceeds the upper bound \(<lb_i>\) is the lower bound \(<ub_i>\) is the upper bound \(<p_i>\) is the power coefficient
Note that when the value of the parameter is exactly equal to the bound, this equation will give a zero likelihood score.
The datafile lists these weights and the power that is to be used for each parameter. The format for this file is shown below:
<switch> <power> <lower> <upper>
where \(<\)lower\(>\) is the weighting used when the parameter hits the lower bound, and \(<\)upper\(>\) is the weighting used when the parameter hits the upper bound, for the parameter with the name \(<\)switch\(>\).
It is possible to define a default penalty that is used for all switches that are not defined separately. To do this, simply enter a line in the data file with the switch name given as “default”, and then the power, lower and upper weights that are required. For example:
default 2 1000 1000
would define a default penalty, where the lower and upper weights were 1000, and the power was 2.
8.2 Understocking
The Understocking likelihood component calculates a penalty that is applied if there are an insufficient number of a particular prey to meet the requirements of the predators. In the case of a fleet, this means that the landings data indicates that more fish have been landed than there are fish in the model, for that timetep and area combination. A well defined model will have a zero likelihood score from this component. The likelihood component that is used is the sum of squares of the overconsumption, given by the equation below:
\[\begin{equation} \tag{8.2} \ell = \sum_{\it time}\sum_{\it areas} \Big(\sum_{\it preys} U_{trp} \Big)^p\end{equation}\]
where: \(<\) U \(>\) is the understocking that has occurred in the model \(<\) p \(>\) is the power coefficient (which should be 2 for sum of squares fit)
To specify an Understocking likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type understocking
powercoeff <power>
The \(<\)power\(>\) value is optional, and if this is not given, the power coefficient is assumed to be 2, giving a sum of squares equation for this likelihood component.
8.3 CatchDistribution
The CatchDistribution likelihood component is used to compare distribution data sampled from the model with distribution data sampled from landings or surveys. The distribution data can either be aggregated into age groups (giving a distribution of length groups for each age), length groups (giving a distribution of age groups for each length) or into age-length groups. The likelihood score that is calculated gives some measure as to how well the data from the model fit to the data from the sample catches.
To specify a CatchDistribution likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type catchdistribution
datafile <name for the datafile>
function <function name>
<multivariate parameters>
aggregationlevel <0 or 1> ; 1 to aggregate data over the whole year
overconsumption <0 or 1> ; 1 to take overconsumption into account
epsilon <epsilon>
areaaggfile <area aggregation file specifying areas>
ageaggfile <age aggregation file specifying ages>
lenaggfile <length aggregation file specifying lengths>
fleetnames <vector of the names of the fleets>
stocknames <vector of the names of the stocks>
The optional flag \(<\)aggregationlevel\(>\) is used to specify whether the distribution data should be aggregated over the whole year (by setting aggregation level to 1) or not aggregated, and calculated for each timestep (by setting aggregation level to 0). If this line is not specified, then an aggregation level of 0 is assumed, and the distribution data is not aggregated over the whole year. Note that not all of the functions used to compare the data can aggregate the data over the whole year.
The optional flag \(<\)overconsumption\(>\) is used to specify whether any over consumption of the stock is to be taken into account when calculating the model distribution. If this is set to 1, then the model catch data will be adjusted to ensure that the fleets don’t catch more stock than is available, by applying a bound to the catch of the fleets. If this line is not specified, then an overconsumption of 0 is assumed and any understocking that is present in the model is ignored, which can lead to an unrealistic result if the understocking likelihood component is not specified.
The optional \(<\)epsilon\(>\) value is used whenever the calculated probability is very unlikely, although the exact format of this depends on the function that is to be used when calculating the likelihood score. This means that the likelihood component is not dominated by one or two stray values, since these will be reset back to less unlikely values. The default value for \(<\)epsilon\(>\) is 10, which is used whenever it is not defined in the input file.
The \(<\)fleetnames\(>\) vector contains a list of all the fleets to be aggregated into a single pseudo fleet for the purposes of the data comparison. Similarly, the \(<\)stocknames\(>\) vector contains a list of all the stocks to be aggregated into a single pseudo stock.
The \(<\)function name\(>\) defines what likelihood function is to be used to compare the modelled age-length catch distribution to the input age-length catch distribution. Currently, there are 8 likelihood functions defined, and the valid function names are:
sumofsquares
- use a sum of squares functionstratified
- use a stratified sum of squares functionmultinomial
- use a multinomial functionpearson
- use a Pearson functiongamma
- use a gamma functionlog
- use a log functionmvn
- use a multivariate normal functionmvlogistic
- use a multivariate logistic function
The \(<\)multivariate parameters\(>\) are only required for the multivariate functions, and Gadget will generate an error if they are specified when they are not required. These parameters are described in the following sections.
Finally, the file specified by \(<\)datafile\(>\) contains a list of the age-length catch distribution that Gadget is to use to fit the likelihood function to, aggregated according to the aggregation files specified, for the numbers calculated in the model. The format of this file is given below:
<year> <step> <area> <age> <length> <number>
where \(<\)number\(>\) is the number of samples for the timestep/area/age/length combination.
8.3.1 Sum of Squares Function
The sum of squares function calculates the likelihood component from equation (8.3) below:
\[\begin{equation} \tag{8.3} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages}\sum_{\it lengths} \Big( \frac{N_{tral}}{N_{tr}} - \frac{\nu_{tral}}{\nu_{tr}} \Big) ^2\end{equation}\]
where: \(< N_{tral} >\) is the data sample size for that time/area/age/length combination \(< \nu_{tral} >\) is the model sample size for that time/area/age/length combination. \(< N_{tr} >\) and \(< \nu_{tr} >\) is the total data and model sample size for that time/area combination respectively.
8.3.2 Stratified Sum of Squares Function
The stratified function calculates the likelihood component from equation (8.4) below:
\[\begin{equation} \tag{8.4} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages}\sum_{\it lengths} \Big( \frac{N_{tral}}{N_{trl}} - \frac{\nu_{tral}}{\nu_{trl}} \Big) ^2\end{equation}\]
The difference between this function and the sum of squares function above is in the way the proportions of the samples are calculated - for this function the proportion is calculated for each length group in turn, whereas for the sum of squares function the proportion is taken over all the length groups. If there is only one length group then these two functions are identical.
8.3.3 Multinomial Function
The multinomial function calculates the likelihood component from equation (8.5) below:
\[\begin{equation} \tag{8.5} \ell = 2 \sum_{\it time}\sum_{\it areas}\sum_{\it ages} \Bigg( \log N_{tra}! - \sum_{\it lengths} \log N_{tral}! + \sum_{\it lengths} \Big( N_{tral} \log {\frac{\nu_{tral}}{\sum \nu_{tral}}} \Big) \Bigg)\end{equation}\]
where: \(<\) N \(>\) is the data sample size for that time/area/age/length combination \(<\nu>\) is the model sample size for that time/area/age/length combination
8.3.4 Pearson Function
The Pearson function calculates the likelihood component from equation (8.6) below:
\[\begin{equation} \tag{8.6} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages}\sum_{\it lengths} \Big( {\frac{ ( N_{tral} - \nu_{tral} ) ^2} {\nu_{tral} + \epsilon}} \Big)\end{equation}\]
where: \(<\) N \(>\) is the data sample size for that time/area/age/length combination \(<\nu>\) is the model sample size for that time/area/age/length combination
8.3.5 Gamma Function
The gamma function calculates the likelihood component from equation (8.7) below:
\[\begin{equation} \tag{8.7} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages}\sum_{\it lengths} \Big( {\frac{ N_{tral}} { (\nu_{tral} + \epsilon )} + \log ({\nu_{tral} + \epsilon}}) \Big)\end{equation}\]
where: \(<\) N \(>\) is the data sample size for that time/area/age/length combination \(<\nu>\) is the model sample size for that time/area/age/length combination
8.3.6 Log Function
The log function calculates the likelihood component from equation (8.8) below:
\[\begin{equation} \tag{8.8} \ell = \sum_{\it time}\sum_{\it areas} \Big( \log \Big( {\frac{ \displaystyle \sum_{\it ages}\sum_{\it lengths} \nu_{tral}} { \displaystyle \sum_{\it ages}\sum_{\it lengths} N_{tral}}} \Big) \Big) ^2\end{equation}\]
where: \(<\) N \(>\) is the data sample size for that time/area/age/length combination \(<\nu>\) is the model sample size for that time/area/age/length combination
8.3.7 Multivariate Normal Function
The multivariate normal function calculates the likelihood component from equation (8.9) below:
\[\begin{equation} \tag{8.9} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages} \Big( log|\Sigma| + (P_{tra} - \pi_{tra})^T \Sigma^{-1}(P_{tra} - \pi_{tra}) \Big)\end{equation}\]
where: \(<\Sigma>\) is the variance-covariance matrix for the multivariate normal distribution \(<\) P \(>\) is the proportion of the data sample for that time/area/age combination \(<\pi>\) is the proportion of the model sample for that time/area/age combination
For the formulation of the variance-covariance matrix, \(<\Sigma>\) is calculated from equations (8.10) and (8.11) below:
\[\begin{equation} \tag{8.10} \Sigma = (\sigma_{ij})_{ij}\end{equation}\]
\[\begin{equation} \tag{8.11} \sigma_{ij} = \begin{cases} \displaystyle \sum^{lag}_{l=1} c_l \sigma_{i-l,j} + \delta^i_{j} \sigma^2 & \textrm{if $i \geq j$} \\ \displaystyle \sum^{lag}_{l=1} c_l \sigma_{j,i-l} & \textrm{otherwise} \end{cases}\end{equation}\]
In equation (8.11) it is assumed that the number in each length group is autocorrelated with lag \(<\) lag \(>\). Note that setting the lag to be zero simplifies the multivariate normal distribution to a univariate one.
To specify this likelihood function, it is necessary to specify the parameters \(<\sigma>\) and \(<\) lag \(>\) and a list of \(<\) lag \(>\) correlation parameters. This is done in the likelihood file, as shown below:
...
function mvn
lag <lag>
sigma <sigma>
param <correlation parameter> ; note that a total of
param <correlation parameter> ; <lag> correlation
... ; parameters are required
aggregationlevel <0 or 1> ; 1 to aggregate data over the whole year
...
8.3.8 Multivariate Logistic Function
The multivariate logistic function calculates the likelihood component from equations (8.12) and (8.13) below:
\[\begin{equation} \tag{8.12} \ell = \frac{1}{2\sigma^2} \sum_{\it time} \Big((L - 1) log(\sigma)+\sum_{\it areas}\sum_{\it ages}\sum_{\it lengths} \tau_{tral}^2 \Big)\end{equation}\]
\[\begin{equation} \tag{8.13} \tau_{tral} = log(P_{tral}) - log(\pi_{tral}) - \frac{1}{L}\sum_{\it lengths} \Big(log(P_{tral}) - log(\pi_{tral}) \Big)\end{equation}\]
where: \(<\) L \(>\) is the number of length groups \(<\) P \(>\) is the proportion of the data sample for that time/area/age/length combination \(<\pi>\) is the proportion of the model sample for that time/area/age/length combination
To specify this likelihood function it is necessary to specify the parameter \(<\sigma>\). This is done in the likelihood file as shown below:
...
function mvlogistic
sigma <sigma>
aggregationlevel <0 or 1> ; 1 to aggregate data over the whole year
...
8.4 CatchStatistics
The CatchStatistics likelihood component is used to compare statistical data sampled from the model with statistical data sampled from landings or surveys. This is typically used to compare biological data, such as the mean length at age or mean weight at age. The likelihood score that is calculated gives some measure as to how well the data from the model fits to the data from the landings.
To specify a CatchStatistics likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type catchstatistics
datafile <name for the datafile>
function <function name>
overconsumption <0 or 1> ; 1 to take overconsumption into account
areaaggfile <area aggregation file specifying areas>
lenaggfile <length aggregation file specifying lengthes; is optional and used only in weight at length likelihood>
ageaggfile <age aggregation file specifying ages>
fleetnames <vector of the names of the fleets>
stocknames <vector of the names of the stocks>
The optional flag \(<\)overconsumption\(>\) is used to specify whether any over consumption of the stock is to be taken into account when calculating the model statistical data. If this is set to 1, then the model catch data will be adjusted to ensure that the fleets don’t catch more stock than is available, by applying a bound to the catch of the fleets. If this line is not specified, then an overconsumption of 0 is assumed and any understocking that is present in the model is ignored, which can lead to an unrealistic result if the understocking likelihood component is not specified.
The \(<\)fleetnames\(>\) vector contains a list of all the fleets to be aggregated into a single pseudo fleet for the purposes of the data comparison. Similarly, the \(<\)stocknames\(>\) vector contains a list of all the stocks to be aggregated into a single pseudo stock.
The \(<\)function name\(>\) defines what likelihood function is to be used to compare the modelled statistical data to the input statistical data. Currently, there are 5 likelihood functions defined, and the format of the statistical data given in the file specified by \(<\)datafile\(>\) depends on the likelihood function used. The valid functions are:
lengthcalcstddev
- use a weighted sum of squares of mean lengthlengthgivenstddev
- use a weighted sum of squares of mean length with given standard deviationweightgivenstddev
- use a weighted sum of squares of mean weight with given standard deviationweightnostddev
- use a unweighted sum of squares of mean weightlengthnostddev
- use a unweighted sum of squares of mean lengthweightgivenstddevlen
- use a weighted sum of squares of mean weight at length with given standard deviationweightnostddevlen
- use a unweighted sum of squares of mean weight at length
8.4.1 Weighted Sum of Squares of Mean Length
This likelihood function calculates the likelihood score based on a weighted sum of squares of the mean length, with the weighting given by calculating the variance of length of the modelled population, as shown in equation (8.14) below:
\[\begin{equation} \tag{8.14} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages} \Big(\frac{(x_{tra}-\mu_{tra})^2} {\sigma_{tra}^2} N_{tra}\Big)\end{equation}\]
where: \(<\) x \(>\) is the sample mean length from the data \(<\mu>\) is the mean length calculated from the model \(<\sigma>\) is the standard deviation of the length, calculated from the model \(<\) N \(>\) is the sample size
For this CatchStatistics function, the format of the statistical data required in the file specified by \(<\)datafile\(>\) is given below:
<year> <step> <area> <age> <number> <mean>
where \(<\)number\(>\) is the number of samples for the timestep/area/age combination, and \(<\)mean\(>\) is the mean length of these samples.
8.4.2 Weighted Sum of Squares of Mean LengthWith Given Standard Deviation
This likelihood function calculates the likelihood score based on a weighted sum of squares of the mean length, with the weighting given the variance of length of the input population, as shown in equation (8.15) below:
\[\begin{equation} \tag{8.15} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages} \Big(\frac{(x_{tra}-\mu_{tra})^2} {s_{tra}^2} N_{tra}\Big)\end{equation}\]
where: \(<\) x \(>\) is the sample mean length from the data \(<\mu>\) is the mean length calculated from the model \(<\) s \(>\) is the standard deviation of the length from the data \(<\) N \(>\) is the sample size
For this CatchStatistics function, the format of the statistical data required in the file specified by \(<\)datafile\(>\) is given below:
<year> <step> <area> <age> <number> <mean> <stddev>
where \(<\)number\(>\) is the number of samples for the timestep/area/age combination, \(<\)mean\(>\) is the mean length of these samples and \(<\)stddev\(>\) is the standard deviation of the length of these samples.
8.4.3 Weighted Sum of Squares of Mean WeightWith Given Standard Deviation
This likelihood function calculates the likelihood score based on a weighted sum of squares of the mean weight, with the weighting given the variance of weight of the input population, as shown in equation (8.16) below:
\[\begin{equation} \tag{8.16} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages} \Big(\frac{(x_{tra}-\mu_{tra})^2} {s_{tra}^2} N_{tra}\Big)\end{equation}\]
where: \(<\) x \(>\) is the sample mean weight from the data \(<\mu>\) is the mean weight calculated from the model \(<\) s \(>\) is the standard deviation of the weight from the data \(<\) N \(>\) is the sample size
For this CatchStatistics function, the format of the statistical data required in the file specified by \(<\)datafile\(>\) is given below:
<year> <step> <area> <age> <number> <mean> <stddev>
where \(<\)number\(>\) is the number of samples for the timestep/area/age combination, \(<\)mean\(>\) is the mean weight of these samples and \(<\)stddev\(>\) is the standard deviation of the weight of these samples.
8.4.4 Unweighted Sum of Squares of Mean Weight
This likelihood function calculates the likelihood score based on a unweighted sum of squares of the mean weight, with the variance of the weight of the population assumed to be 1, as shown in equation (8.17) below:
\[\begin{equation} \tag{8.17} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages} \Big((x_{tra}-\mu_{tra})^2 N_{tra}\Big)\end{equation}\]
where: \(<\) x \(>\) is the sample mean weight from the data \(<\mu>\) is the mean weight calculated from the model \(<\) N \(>\) is the sample size
For this CatchStatistics function, the format of the statistical data required in the file specified by \(<\)datafile\(>\) is given below:
<year> <step> <area> <age> <number> <mean>
where \(<\)number\(>\) is the number of samples for the timestep/area/age combination, and \(<\)mean\(>\) is the mean weight of these samples.
8.4.5 Unweighted Sum of Squares of Mean Length
This likelihood function calculates the likelihood score based on a unweighted sum of squares of the mean length, with the variance of the length of the population assumed to be 1, as shown in equation (8.18) below:
\[\begin{equation} \tag{8.18} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages} \Big((x_{tra}-\mu_{tra})^2 N_{tra}\Big)\end{equation}\]
where: \(<\) x \(>\) is the sample mean length from the data \(<\mu>\) is the mean length calculated from the model \(<\) N \(>\) is the sample size
For this CatchStatistics function, the format of the statistical data required in the file specified by \(<\)datafile\(>\) is given below:
<year> <step> <area> <age> <number> <mean>
where \(<\)number\(>\) is the number of samples for the timestep/area/age combination, and \(<\)mean\(>\) is the mean length of these samples.
8.5 StockDistribution
The StockDistribution likelihood component is used to compare distribution data sampled from the model with distribution data sampled from landings or surveys for different stocks within the Gadget model. This is typically used to compare Gadget stocks that are based on the same species, but have differing biological properties (eg. immature and mature fish). The distribution data can either be aggregated into age groups (giving a distribution of length groups for each age), length groups (giving a distribution of age groups for each length) or into age-length groups. The likelihood score that is calculated gives some measure as to how well the data from the model fits to the data from the landings.
To specify a StockDistribution likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type stockdistribution
datafile <name for the datafile>
function <function name>
aggregationlevel <0 or 1> ; 1 to aggregate data over the whole year
overconsumption <0 or 1> ; 1 to take overconsumption into account
epsilon <epsilon>
areaaggfile <area aggregation file specifying areas>
ageaggfile <age aggregation file specifying ages>
lenaggfile <length aggregation file specifying lengths>
fleetnames <vector of the names of the fleets>
stocknames <vector of the names of the stocks>
The optional flag \(<\)aggregationlevel\(>\) is used to specify whether the distribution data should be aggregated over the whole year (by setting aggregation level to 1) or not aggregated, and calculated for each timestep (by setting aggregation level to 0). If this line is not specified, then an aggregation level of 0 is assumed, and the distribution data is not aggregated over the whole year. Note that not all of the functions used to compare the data can aggregate the data over the whole year.
The optional flag \(<\)overconsumption\(>\) is used to specify whether any over consumption of the stock is to be taken into account when calculating the model distribution. If this is set to 1, then the model catch data will be adjusted to ensure that the fleets don’t catch more stock than is available, by applying a bound to the catch of the fleets. If this line is not specified, then an overconsumption of 0 is assumed and any understocking that is present in the model is ignored, which can lead to an unrealistic result if the understocking likelihood component is not specified.
The optional \(<\)epsilon\(>\) value is used whenever the calculated probability is very unlikely, although the exact format of this depends on the function that is to be used when calculating the likelihood score. This means that the likelihood component is not dominated by one or two stray values, since these will be reset back to less unlikely values. The default value for \(<\)epsilon\(>\) is 10, which is used whenever it is not defined in the input file.
The \(<\)fleetnames\(>\) vector contains a list of all the fleets to be aggregated into a single pseudo fleet for the purposes of the data comparison. However, the \(<\)stocknames\(>\) vector contains a list of all the stocks to be compared for the data comparison. These stocks are not aggregated into a single pseudo stock.
The \(<\)function name\(>\) defines what likelihood function is to be used to compare the modelled age-length stock distribution to the input age-length stock distribution. Currently, there are two likelihood functions defined, and the valid functions are:
sumofsquares
- use a sum of squares functionmultinomial
- use a multinomial function
Finally, the datafile is a list of the age-length catch distribution for each stock, that Gadget is to use to fit the likelihood function to, aggregated according to the aggregation files specified, for the numbers calculated in the model. The format of this file is given below:
<year> <step> <area> <stock> <age> <length> <number>
where \(<\)number\(>\) is the number of samples for the timestep/area/stock/age/length combination.
8.5.1 Sum of Squares Function
The sum of squares function calculates the likelihood component from equation (8.19) below:
\[\begin{equation} \tag{8.19} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages}\sum_{\it lengths}\sum_{\it stocks} \Big( \frac{N_{trals}}{N_{tr}} - \frac{\nu_{trals}}{\nu_{tr}} \Big) ^2\end{equation}\]
where: \(< N_{trals} >\) is the data sample size for that time/area/age/length/stock combination \(< \nu_{trals} >\) is the model sample size for that time/area/age/length/stock combination. \(< N_{tr} >\) and \(< \nu_{tr} >\) is the total data and model sample size for that time/area combination respectively.
8.5.2 Multinomial Function
The multinomial function calculates the likelihood component from equation (8.20) below:
\[\begin{equation} \tag{8.20} \ell = 2 \sum_{\it time}\sum_{\it areas}\sum_{\it ages}\sum_{\it lengths} \Bigg( \log N_{tral}! - \sum_{\it stocks} \log N_{trals}! + \sum_{\it stocks} \Big( N_{trals} \log {\frac{\nu_{tral}}{\sum \nu_{trals}}} \Big)\Bigg)\end{equation}\]
where: \(<\) N \(>\) is the data sample size for that time/area/age/length/stock combination \(<\nu>\) is the model sample size for that time/area/age/length/stock combination
8.6 SurveyIndices
The SurveyIndices likelihood component is used to compare the development of a stock in the Gadget model to indices calculated from a standardized survey for that stock. These indices can be aggregated into length groups or age groups. The likelihood component that is used is the sum of squares of a linear regression fitted to the difference between the modelled data and the specified index, given by equation (8.21) below:
\[\begin{equation} \tag{8.21} \ell = \sum_{\it time}\Big(I_{t} - (\alpha + \beta N_{t})\Big)^2\end{equation}\]
where: \(<\) I \(>\) is the observed survey index \(<\) N \(>\) is the corresponding index calculated in the Gadget model
The exact format of this linear regression equation will vary, depending on survey index data available. It is possible to take the log of the indices and the modelled data before fitting the linear regression line. The slope and intercept of the linear regression line are controlled by the parameters alpha and beta, and it is possible to fix these to specified numbers, or let Gadget calculate these to get the best fit to the modelled data.
To specify a SurveyIndices likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type surveyindices
datafile <name for the datafile>
sitype <survey index type>
biomass <0 or 1> ; 1 to base index data on biomass
<survey index data>
The optional flag \(<\)biomass\(>\) is used to specify whether the index data should be based on the biomass of the stock or on the population numbers for the stock. If this is set to 1, then the index data calculated in the model will be based on the available biomass of the stock. If this line is not specified, then a biomass value of 0 is assumed and the index data calculated in the model will be based on the available population numbers for the stock.
The format of the survey index data, and the contents of the datafile, depend on the type of survey index that is to be used, which is specified by the value of \(<\)survey index type\(>\). There are currently 5 valid options, which are:
lengths
- defining a length group based survey indexages
- defining an age group based survey indexfleets
- defining a length group based survey index, taking the fleet selectivity into accountacoustic
- defining an acoustic based survey indexeffort
- defining an fishing effort based survey index
8.6.1 SurveyIndices by Length
To specify a length group based SurveyIndices likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type surveyindices
datafile <name for the datafile>
sitype lengths
biomass <0 or 1> ; 1 to base index data on biomass
areaaggfile <area aggregation file specifying areas>
lenaggfile <length aggregation file specifying lengths>
stocknames <vector of the names of the stocks>
fittype <fit type>
<fit type parameters>
The datafile is a list of the indices that Gadget is to use to fit the linear regression to, aggregated according to the length aggregation file specified, for the population numbers calculated in the model. The format of this file is given below:
<year> <step> <area> <length> <number>
where \(<\)number\(>\) is the survey index for that timestep/area/length combination.
The \(<\)fit type\(>\) defines the type of linear regression equation to be used to calculate the likelihood score for this likelihood component. These options specify whether or not the log of the numbers is to be used, and whether the parameters alpha and beta are to be estimated by Gadget, or fixed. If these parameters are to be fixed, then they are specified here. In total, there are 8 valid entries for \(<\)fit type\(>\), and the associated parameters, and these are:
linearfit
loglinearfit
fixedslopelinearfit
fixedslopeloglinearfit
fixedinterceptlinearfit
fixedinterceptloglinearfit
fixedlinearfit
fixedloglinearfit
8.6.1.1 linear regression, estimating both slope and intercept
This fit type will fit a linear regression line, with the alpha and beta parameter values estimated from the data within the Gadget model. The file format for this fit type is given below:
fittype linearfit
8.6.1.2 log linear regression, estimating both slope and intercept
This fit type will fit a log linear regression line, with the alpha and beta parameter values estimated from the data within the Gadget model. The file format for this fit type is given below:
fittype loglinearfit
8.6.1.3 linear regression, fixing slope and estimating intercept
This fit type will fit a linear regression line, with the alpha parameter value estimated from the data within the Gadget model, and the beta parameter value specified in the input file. The file format for this fit type is given below:
fittype fixedslopelinearfit
slope <beta>
8.6.1.4 log linear regression, fixing slope and estimating intercept
This fit type will fit a log linear regression line, with the alpha parameter value estimated from the data within the Gadget model, and the beta parameter value specified in the input file. The file format for this fit type is given below:
fittype fixedslopeloglinearfit
slope <beta>
8.6.1.5 linear regression, fixing intercept and estimating slope
This fit type will fit a linear regression line, with the beta parameter value estimated from the data within the Gadget model, and the alpha parameter value specified in the input file. The file format for this fit type is given below:
fittype fixedinterceptlinearfit
intercept <alpha>
8.6.1.6 log linear regression, fixing intercept and estimating slope
This fit type will fit a log linear regression line, with the beta parameter value estimated from the data within the Gadget model, and the alpha parameter value specified in the input file. The file format for this fit type is given below:
fittype fixedinterceptloglinearfit
intercept <alpha>
8.6.2 SurveyIndices by Age
To specify an age group based SurveyIndices likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type surveyindices
datafile <name for the datafile>
sitype ages
biomass <0 or 1> ; 1 to base index data on biomass
areaaggfile <area aggregation file specifying areas>
ageaggfile <age aggregation file specifying ages>
stocknames <vector of the names of the stocks>
fittype <fit type>
<fit type parameters>
The datafile is a list of the indices that Gadget is to use to fit the linear regression to, aggregated according to the age aggregation file specified, for the population numbers calculated in the model. The format of this file is given below:
<year> <step> <area> <age> <number>
where \(<\)number\(>\) is the survey index for that timestep/area/age combination.
The \(<\)fit type\(>\) defines the type of linear regression equation to be used to calculate the likelihood score for this likelihood component. The valid fit type options are the same as for the length based survey indices, given in section 8.6.1 above.
8.6.3 SurveyIndices by Fleet
To specify a length group based SurveyIndices likelihood component taking the fleet selectivity into account, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type surveyindices
datafile <name for the datafile>
sitype fleets
biomass <0 or 1> ; 1 to base index data on biomass
areaaggfile <area aggregation file specifying areas>
lenaggfile <length aggregation file specifying lengths>
fleetnames <vector of the names of the fleets>
stocknames <vector of the names of the stocks>
fittype <fit type>
<fit type parameters>
The datafile is a list of the indices that Gadget is to use to fit the linear regression to, aggregated according to the length aggregation file specified, for the population numbers calculated in the model. The format of this file is given below:
<year> <step> <area> <length> <number>
where \(<\)number\(>\) is the survey index for that timestep/area/length combination.
The \(<\)fit type\(>\) defines the type of linear regression equation to be used to calculate the likelihood score for this likelihood component. The valid fit type options are the same as for the length based survey indices, given in section 8.6.1 above.
8.6.4 SurveyIndices by Acoustic
To specify an acoustic based SurveyIndices likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type surveyindices
datafile <name for the datafile>
sitype acoustic
biomass <0 or 1> ; 1 to base index data on biomass
areaaggfile <area aggregation file specifying areas>
surveynames <vector of the names of the acoustic surveys>
stocknames <vector of the names of the stocks>
fittype <fit type>
<fit type parameters>
The datafile is a list of the acoustic indices that Gadget is to use to fit the linear regression to, for the population calculated in the model. The format of this file is given below:
<year> <step> <area> <survey> <acoustic>
where \(<\)acoustic\(>\) is the acoustic index for that timestep/area/survey combination.
The \(<\)fit type\(>\) defines the type of linear regression equation to be used to calculate the likelihood score for this likelihood component. The valid fit type options are the same as for the length based survey indices, given in section 8.6.1 above.
8.6.5 SurveyIndices by Effort
To specify an effort based SurveyIndices likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type surveyindices
datafile <name for the datafile>
sitype effort
biomass <0 or 1> ; 1 to base index data on biomass
areaaggfile <area aggregation file specifying areas>
fleetnames <vector of the names of the fleets>
stocknames <vector of the names of the stocks>
fittype <fit type>
<fit type parameters>
The datafile is a list of the fleet effort indices that Gadget is to use to fit the linear regression to, for the fishing effort calculated in the model. The format of this file is given below:
<year> <step> <area> <fleet> <effort>
where \(<\)effort\(>\) is the effort index for that timestep/area/fleet combination.
The \(<\)fit type\(>\) defines the type of linear regression equation to be used to calculate the likelihood score for this likelihood component. The valid fit type options are the same as for the length based survey indices, given in section 8.6.1 above.
8.7 SurveyDistribution
The SurveyDistribution likelihood component is used to compare the development of a stock in the Gadget model to age-length indices calculated from a survey for that stock. The likelihood score that is calculated gives some measure as to how well the data from the model fits to the data from the calculated survey index distribution.
To specify a SurveyDistribution likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type surveydistribution
datafile <name for the datafile>
areaaggfile <area aggregation file specifying areas>
lenaggfile <length aggregation file specifying lengths>
ageaggfile <age aggregation file specifying ages>
stocknames <vector of the names of the stocks>
fittype <fit type>
parameters <fit type parameters>
<suitability parameters>
epsilon <epsilon>
likelihoodtype <likelihood type>
The \(<\)stocknames\(>\) vector contains a list of all the stocks to be aggregated into a single pseudo stock for the purposes of the data comparison. The \(<\)suitability parameters\(>\) define the suitability of the survey fleet that was used to collect the survey index data. This is the same format as the suitability functions for the stock, as discussed in section 4.7.3 above. Note that only one set of suitability values is defined, which will be applied to all the stocks for this likelihood component.
The \(<\)fit type\(>\) defines what function is to be used to calculate the survey index distribution from the modelled population. Currently, there are two functions defined, and the valid function names are:
linearfit
- use a linear functionpowerfit
- use a power function
The \(<\)fit type parameters\(>\) is a vector of 2 parameters that are used to calculate the survey index values from the modelled population. The \(<\)epsilon\(>\) value is used whenever the calculated probability is very unlikely, although the exact format of this depends on the likelihood type that is to be used when calculating the likelihood score.
The \(<\)likelihood type\(>\) defines what function is to be used to compare the modelled survey index distribution to the input survey index distribution. Currently, there are 4 functions defined, and the valid function names are:
multinomial
- use a multinomial functionpearson
- use a Pearson functiongamma
- use a gamma functionlog
- use a log function
Finally, the file specified by \(<\)datafile\(>\) contains a list of the age-length survey indices that Gadget is to use to fit the likelihood function to, aggregated according to the aggregation files specified, for the numbers calculated in the model. The format of this file is given below:
<year> <step> <area> <age> <length> <number>
where \(<\)number\(>\) is the survey index for the timestep/area/age/length combination.
8.7.1 Linear Fit
The linear fit function calculates the survey index for the modelled population from equation (8.22) below:
\[\begin{equation} \tag{8.22} \widehat{I}_{tral} = q_{0} S_{l} \big( N_{tral} + q_{1} \big)\end{equation}\]
where: \(<\) S \(>\) is the calculated suitability value for that length group \(<\) N \(>\) is the model population for that time/area/age/length combination
8.7.2 Power Fit
The power fit function calculates the survey index for the modelled population from equation (8.22) below:
\[\begin{equation} \tag{8.23} \widehat{I}_{tral} = q_{0} S_{l} N_{tral} ^{q_{1}}\end{equation}\]
where: \(<\) S \(>\) is the calculated suitability value for that length group \(<\) N \(>\) is the model population for that time/area/age/length combination
8.7.3 Multinomial Function
The multinomial function calculates the likelihood component from equation (8.24) below:
\[\begin{equation} \tag{8.24} \ell = \sum_{\it time}\sum_{\it areas} \bigg( \log \big(\sum_{\it ages}\sum_{\it lengths} \widehat{I}_{tral} \big) - {\frac { \displaystyle \sum_{\it ages}\sum_{\it lengths} \big(\widehat{I}_{tral} \log (I_{tral} + \epsilon) \big)} { \displaystyle \sum_{\it ages}\sum_{\it lengths} I_{tral}} } \bigg)\end{equation}\]
where: \(<\) I \(>\) is the data survey index for that time/area/age/length combination \(<\widehat{I}>\) is the model survey index for that time/area/age/length combination
8.7.4 Pearson Function
The Pearson function calculates the likelihood component from equation (8.25) below:
\[\begin{equation} \tag{8.25} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages}\sum_{\it lengths} \Big( {\frac{ ( I_{tral} - \widehat{I}_{tral} ) ^2} {\widehat{I}_{tral} + \epsilon}} \Big)\end{equation}\]
where: \(<\) I \(>\) is the data survey index for that time/area/age/length combination \(<\widehat{I}>\) is the model survey index for that time/area/age/length combination
8.7.5 Gamma Function
The gamma function calculates the likelihood component from equation (8.26) below:
\[\begin{equation} \tag{8.26} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it ages}\sum_{\it lengths} \Big( {\frac{ I_{tral}} { (\widehat{I}_{tral} + \epsilon )} + \log ({\widehat{I}_{tral} + \epsilon}}) \Big)\end{equation}\]
where: \(<\) I \(>\) is the data survey index for that time/area/age/length combination \(<\widehat{I}>\) is the model survey index for that time/area/age/length combination
8.7.6 Log Function
The log function calculates the likelihood component from equation (8.27) below:
\[\begin{equation} \tag{8.27} \ell = \sum_{\it time}\sum_{\it areas} \Big( \log \Big( {\frac{ \displaystyle \sum_{\it ages}\sum_{\it lengths} \widehat{I}_{tral}} { \displaystyle \sum_{\it ages}\sum_{\it lengths} I_{tral}}} \Big) \Big) ^2\end{equation}\]
where: \(<\) I \(>\) is the data survey index for that time/area/age/length combination \(<\widehat{I}>\) is the model survey index for that time/area/age/length combination
8.8 StomachContent
The StomachContent likelihood component is used to compare consumption data sampled from the model with stomach content data obtained by analysing the stomach contents of various predators. This data can be used to give an indication of the diet composition of the stock. The likelihood score that is calculated gives some measure as to how well the consumption data from the model fits to the data from the stomach contents. Care is needed when making this comparison, since the data will give information on the stomach content at the time of capture of the predator, where as the Gadget simulation can only give information about the modelled consumption of the prey by the predator.
To specify a StomachContent likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type stomachcontent
function <function name>
datafile <name for the datafile>
epsilon <epsilon>
areaaggfile <area aggregation file specifying areas>
predatornames <vector of the names of the predators>
predatorlengths
lenaggfile <length aggregation file specifying predator lengths>
preyaggfile <prey aggregation file specifying preys>
The optional \(<\)epsilon\(>\) value is used whenever the calculated probability is very unlikely, although the exact format of this depends on the function that is to be used when calculating the likelihood score. This means that the likelihood component is not dominated by one or two stray values, since these will be reset back to less unlikely values. The default value for \(<\)epsilon\(>\) is 10, which is used whenever it is not defined in the input file.
The \(<\)predatornames\(>\) vector contains a list of all the predators to be aggregated into a single pseudo predator for the purposes of the data comparison.
The \(<\)function name\(>\) defines what likelihood function is to be used to compare the modelled consumption data to the input stomach content data. Currently, there is only one likelihood function defined, so the valid function name is:
scsimple
- use a simple ratio function
Finally, the file specified by \(<\)datafile\(>\) contains a list of the stomach content data that Gadget is to use to fit the likelihood function to, aggregated according to the aggregation files specified, for the consumption calculated in the model. The format of this file is given below:
<year> <step> <area> <predator> <prey> <ratio>
where \(<\)ratio\(>\) is the ratio of prey \(<\)prey\(>\) in the stomachs of predator \(<\)predator\(>\) for the timestep/area combination, where \(<\)prey\(>\) is defined in the prey aggregation file, and \(<\)predator\(>\) is defined in the predator length aggregation file.
8.8.1 SCSimple Function
The scsimple function calculates the likelihood component by comparing the ratio of the consumption of different preys by a predator in the model to the ratio of the preys found in the stomach contents data specified in the input file, as shown in equation (8.28) below:
\[\begin{equation} \tag{8.28} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it predators}\sum_{\it preys} \Big( P_{trpp} - \pi_{trpp} \Big) ^2\end{equation}\]
where: \(<\) P \(>\) is the ratio of the stomach content data for that time/area/predator/prey combination \(<\pi>\) is the ratio of the modelled consumption for that time/area/predator/prey combination
8.9 Recaptures
The Recaptures likelihood component is used to compare recaptures data from tagging experiments within the model with recaptures data obtained from tagging experiments, aggregated according to length at recapture. The likelihood score that is calculated gives some measure as to how well the data from the model fits the recaptures data.
To specify a Recaptures likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type recaptures
datafile <name for the datafile>
function <function name>
areaaggfile <area aggregation file specifying areas>
lenaggfile <length aggregation file specifying recapture lengths>
fleetnames <vector of the names of the fleets>
The \(<\)fleetnames\(>\) vector contains a list of all the fleets to be aggregated into a single pseudo fleet for the purposes of the data comparison.
The \(<\)function name\(>\) defines what likelihood function is to be used to compare the modelled recaptures data to the input recaptures data. Currently, there is only one likelihood function defined, so the only valid function name is:
poisson
- use a Poisson function
Finally, the datafile is a list of the recaptures that Gadget is to use to fit the likelihood function to, aggregated according to the aggregation files specified, for the numbers calculated in the model. The format of this file is given below:
<tagid> <year> <step> <area> <length> <number>
where \(<\)number\(>\) is the number of recaptures for the tag/timestep/area/length combination.
8.9.1 Poisson Function
The Poisson function calculates the likelihood component from equation (8.29) below:
\[\begin{equation} \tag{8.29} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it lengths} \Big( N_{trl} + \log \nu_{trl}! - N_{trl} \log \nu_{trl} \Big)\end{equation}\]
where: \(<\) N \(>\) is the number of observed recaptures for that time/area/length combination \(<\nu>\) is the number of modelled recaptures for that time/area/length combination
8.10 RecStatistics
The RecStatistics likelihood component is used to compare statistical data sampled from tagged subpopulations within the model with statistical data obtained from the fish returned from tagging experiments. This is used to compare biological data, such as the mean length at age, and is similar to the CatchStatistics likelihood component (see section 8.4). The likelihood score that is calculated gives some measure as to how well the data from the model fits to the data from the recaptures.
To specify a RecStatistics likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type recstatistics
datafile <name for the datafile>
function <function name>
areaaggfile <area aggregation file specifying areas>
fleetnames <vector of the names of the fleets>
The \(<\)fleetnames\(>\) vector contains a list of all the fleets to be aggregated into a single pseudo fleet for the purposes of the data comparison.
The \(<\)function name\(>\) defines what likelihood function is to be used to compare the modelled statistical data to the input statistical data. Currently, there are three likelihood functions defined, and the format of the statistical data given in the file specified by \(<\)datafile\(>\) depends on the likelihood function used. The valid functions are:
lengthcalcstddev
- use a weighted sum of squares of mean lengthlengthgivenstddev
- use a weighted sum of squares of mean length with given standard deviationlengthnostddev
- use a unweighted sum of squares of mean length
8.10.1 Weighted Sum of Squares of Mean Length
This likelihood function calculates the likelihood score based on a weighted sum of squares of the mean length, with the weighting given by calculating the variance of length of the modelled population, as shown in equation (8.30) below:
\[\begin{equation} \tag{8.30} \ell = \sum_{\it tags}\sum_{\it time}\sum_{\it areas} \Big(\frac{(x-\mu)^2} {\sigma^2} N\Big)\end{equation}\]
where: \(<\) x \(>\) is the sample mean length from the data \(<\mu>\) is the mean length calculated from the model \(<\sigma>\) is the standard deviation of the length, calculated from the model \(<\) N \(>\) is the sample size
For this RecStatistics function, the format of the statistical data required in the file specified by \(<\)datafile\(>\) is given below:
<tagid> <year> <step> <area> <number> <mean>
where \(<\)number\(>\) is the number of samples for the tag/timestep/area combination, and \(<\)mean\(>\) is the mean length of these samples.
8.10.2 Weighted Sum of Squares of Mean LengthWith Given Standard Deviation
This likelihood function calculates the likelihood score based on a weighted sum of squares of the mean length, with the weighting given the variance of length of the input population, as shown in equation (8.31) below:
\[\begin{equation} \tag{8.31} \ell = \sum_{\it tags}\sum_{\it time}\sum_{\it areas} \Big(\frac{(x-\mu)^2} {s^2} N\Big)\end{equation}\]
where: \(<\) x \(>\) is the sample mean length from the data \(<\mu>\) is the mean length calculated from the model \(<\) s \(>\) is the standard deviation of the length from the data \(<\) N \(>\) is the sample size
For this RecStatistics function, the format of the statistical data required in the file specified by \(<\)datafile\(>\) is given below:
<tagid> <year> <step> <area> <number> <mean> <stddev>
where \(<\)number\(>\) is the number of samples for the tag/timestep/area combination, \(<\)mean\(>\) is the mean length of these samples and \(<\)stddev\(>\) is the standard deviation of the length of these samples.
8.10.3 Unweighted Sum of Squares of Mean Length
This likelihood function calculates the likelihood score based on a unweighted sum of squares of the mean length, with the variance of the length of the population assumed to be 1, as shown in equation (8.32) below:
\[\begin{equation} \tag{8.32} \ell = \sum_{\it tags}\sum_{\it time}\sum_{\it areas} \Big((x-\mu)^2 N\Big)\end{equation}\]
where: \(<\) x \(>\) is the sample mean length from the data \(<\mu>\) is the mean length calculated from the model \(<\) N \(>\) is the sample size
For this RecStatistics function, the format of the statistical data required in the file specified by \(<\)datafile\(>\) is given below:
<tagid> <year> <step> <area> <number> <mean>
where \(<\)number\(>\) is the number of samples for the tag/timestep/area combination, and \(<\)mean\(>\) is the mean length of these samples.
8.11 MigrationPenalty
The MigrationPenalty likelihood component is used to give a penalty whenever there is a negative migration value from the migration matrices (which is meaningless). The MigrationPenalty component is used (rather than the BoundLikelihood component) since the values in the migration matrices are calculated from more than one parameter, and it is not necessarily the individual parameters that are wrong, rather the combination of the parameters that give the migration matrix value that is wrong. The likelihood component that is used is based on the sum of squares of the migration values, given by the equation below:
\[\begin{equation} \tag{8.33} \ell = \left( \sum_{ij}^{} M_{ij}^{p_0} \right)^{p_1}\end{equation}\]
The use of 2 power coefficients gives increased flexibility for the likelihood component. In general, a higher value of \(p_1\) applies a higher penalty to “many small negative values”, where as a higher value of \(p_0\) applies a higher penalty to “few large negative values”. For a simple sum of squares of the migration matrix values, \(p_0\) should be set to 2, and \(p_1\) should be set to 1.
To specify a MigrationPenalty likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type migrationpenalty
stockname <name for the stock to check>
powercoeffs <p0> <p1>
Note that it is not possible to aggregate more than one stock into a single pseudo stock for this likelihood component.
8.12 MigrationProportion
The MigrationProportion likelihood component is used to compare population proportion data sampled from the model with population proportion data sampled from landings or surveys. The populations proportion data gives the proportion of the population is that is present on each area on a given timestep. The likelihood score that is calculated gives some measure as to how well the migration data from the model fit to the data from the sample catches.
To specify a MigrationProportion likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type migrationproportion
datafile <name for the datafile>
function <function name>
biomass <0 or 1> ; 1 to base migration proportion data on biomass
areaaggfile <area aggregation file specifying areas>
stocknames <vector of the names of the stocks>
The optional flag \(<\)biomass\(>\) is used to specify whether the migration proportion data should be based on the biomass of the stock or on the population numbers for the stock. If this is set to 0, then the migration proportion data calculated in the model will be based on the available population numbers for the stock. If this line is not specified, then a biomass value of 1 is assumed and the migration proportion data calculated in the model will be based on the available population biomass of the stock.
The \(<\)stocknames\(>\) vector contains a list of all the stocks to be aggregated into a single pseudo stock for the purposes of the data comparison.
The \(<\)function name\(>\) defines what likelihood function is to be used to compare the modelled migration proportion data to the input migration proportion data. Currently, there is only one likelihood function defined, so the valid function name is:
sumofsquares
- use a simple sum of squares function
Finally, the file specified by \(<\)datafile\(>\) contains a list of the migration proportion data that Gadget is to use to fit the likelihood function to, aggregated according to the aggregation files specified, for the migration proportions calculated in the model. The format of this file is given below:
<year> <step> <area> <ratio>
where \(<\)ratio\(>\) is the proportion of stock that is in area \(<\)area\(>\) for that timestep.
8.12.1 Sum of Squares Function
The sum of squares function calculates the likelihood component from equation (8.34) below:
\[\begin{equation} \tag{8.34} \ell = \sum_{\it time}\sum_{\it areas} \Big( P_{tr} - \pi_{tr} \Big) ^2\end{equation}\]
where: \(<\) P \(>\) is the proportion of the data sample for that time/area combination \(<\pi>\) is the proportion of the model sample for that time/area combination
8.13 CatchInKilos
The CatchInKilos likelihood component is used to compare the overall catch from the modelled fleets with landings data. This can be done for any fleet that has landings data available, but will give more useful information when used with fleets of type “LinearFleet”, since the “TotalFleet” fleet type will catch the amount specified in the input file (see section 7 for more information on the available fleet types).
To specify a CatchInKilos likelihood component, the format required in the main likelihood file is as follows:
[component]
name <name for the likelihood component>
weight <weight for the likelihood component>
type catchinkilos
datafile <name for the datafile>
function <function name>
aggregationlevel <0 or 1> ; 1 to aggregate data over the whole year
epsilon <epsilon>
areaaggfile <area aggregation file specifying areas>
fleetnames <vector of the names of the fleets>
stocknames <vector of the names of the stocks>
The optional flag \(<\)aggregationlevel\(>\) is used to specify whether the catch data should be aggregated over the whole year (by setting aggregation level to 1) or not aggregated, and calculated for each timestep (by setting aggregation level to 0). If this line is not specified, then an aggregation level of 0 is assumed, and the catch data is not aggregated over the whole year.
The \(<\)fleetnames\(>\) vector contains a list of all the fleets to be aggregated into a single pseudo fleet for the purposes of the data comparison. Similarly, the \(<\)stocknames\(>\) vector contains a list of all the stocks to be aggregated into a single pseudo stock.
The optional \(<\)epsilon\(>\) value is used in the likelihood function to avoid problems that would arise from taking the logarithm of zero. Epsilon is added to both the modelled and observed landings data, to ensure that these values are always positive, and thus should be set to a small number. The default value for \(<\)epsilon\(>\) is 10, which is used whenever it is not defined in the input file.
The \(<\)function name\(>\) defines what likelihood function is to be used to compare the modelled catch to the input catch. Currently, there is only one likelihood function defined, so the only valid function name is:
sumofsquares
- use a log sum of squares function
Finally, the file specified by \(<\)datafile\(>\) contains the landings data that Gadget is to use to fit the likelihood function to for the catch calculated in the model. The format of this file is given below:
<year> <step> <area> <fleet> <biomass>
where \(<\)biomass\(>\) is the catch for the timestep/area/fleet combination. The \(<\)step\(>\) column is optional if the \(<\)aggregationlevel\(>\) flag has been set to 1, since the data will be aggregated over the whole year. In this case, it is possible to specify the landings data in the following format:
<year> <area> <fleet> <biomass>
8.13.1 Sum of Squares Function
The sum of squares function calculates the likelihood component from equation (8.35) below:
\[\begin{equation} \tag{8.35} \ell = \sum_{\it time}\sum_{\it areas}\sum_{\it fleets} (\log(N_{trf} + \epsilon) - \log(\nu_{trf} + \epsilon))^2\end{equation}\]
where: \(<\) N \(>\) is the catch biomass for that time/area/fleet combination \(<\nu>\) is the modelled catch biomass for that time/area/fleet combination