Gather nll in a g3 model

g3l_distribution_sumofsquares(over = c("area"))

g3l_distribution_multinomial(epsilon = 10)

g3l_distribution_multivariate(rho_f, sigma_f, over = c("area"))

g3l_distribution_surveyindices_log(alpha = NULL, beta = 1)

g3l_distribution_surveyindices_linear(alpha = NULL, beta = 1)

g3l_distribution_sumofsquaredlogratios(epsilon = 10)

g3l_abundancedistribution(
        nll_name,
        obs_data,
        fleets = list(),
        stocks,
        function_f,
        transform_fs = list(),
        missing_val = 0,
        area_group = NULL,
        report = FALSE,
        nll_breakdown = FALSE,
        weight = substitute(
            g3_param(n, optimise = FALSE, value = 1),
            list(n = paste0(nll_name, "_weight"))),
        run_at = g3_action_order$likelihood)

g3l_catchdistribution(
        nll_name,
        obs_data,
        fleets = list(),
        stocks,
        function_f,
        transform_fs = list(),
        missing_val = 0,
        area_group = NULL,
        report = FALSE,
        nll_breakdown = FALSE,
        weight = substitute(
            g3_param(n, optimise = FALSE, value = 1),
            list(n = paste0(nll_name, "_weight"))),
        run_at = g3_action_order$likelihood)

g3_distribution_preview(
        obs_data,
        fleets = list(),
        stocks = list(),
        area_group = NULL)

Arguments

over

When comparing proportions of lengthgroups, specifies the dimensions that define the total. For example the default "area" means the proprtion of the current lengthgroup to all individuals in that area.

rho_f,sigma_f

formula substituted into multivariate calcuations, see below.

epsilon

Value to be used whenever the calculated probability is very unlikely. Default 10.

alpha

formula substituted into surveyindices calcuations to fix intercept of linear regression, or NULL if not fixed. See below.

beta

formula substituted into surveyindices calcuations to fix slope of linear regression, or NULL if not fixed. See below.

nll_name

Character string, used to define the variable name for obsstock and modelstock.

obs_data

Data.frame of observation data, for example the results of mfdb_sample_count.

Should at least have a year column, and a length or weight column. For more information, see "obs_data and data aggregation" below.

fleets

A list of g3_stock objects to collect catch data for. If empty, will collect abundance data for stocks instead.

stocks

A list of g3_stock objects to collect catch or abundance data for, depending if stocks were provided.

function_f

A formula to compare obsstock__x to modelstock__x and generate nll, defined by one of the g3l_distribution_* functions.

This will be adapted to compare either number (modelstock__num) or weight (modelstock__wgt) depending on what columns obs_data has.

transform_fs

A list of dimension name to formula to apply to model data before collating. See examples.

missing_val

Where there are missing values in the incoming data, value to replace them with.

area_group

mfdb_group or list mapping area names used in obs_data to integer model areas, see "obs_data and data aggregation" below.

report

If TRUE, add model and observation arrays to the model report, called cdist_nll_name_model__num/wgt and cdist_nll_name_obs__num/wgt respectively

nll_breakdown

Should the nll report be broken down by time? TRUE / FALSE

weight

Weighting applied to this likelihood component. Default is a g3_param that defaults to 1, allowing weights to be altered without recompiling.

run_at

Integer order that actions will be run within model, see g3_action_order.

Details

The actions will define the following variables in your model:

obsstock__num/wgt

A g3_stock instance that contains all observations in an array

modelstock__num/wgt

A g3_stock instance that groups in an identical fashion to obsstock, that will be filled with the model's predicted values

The model report will contain nll_cdist_nll_name__num and/or nll_cdist_nll_name__wgt, depending on the columns in obs_data (a number column will compare by individuals, and produce a corresponding num report). If nll_breakdown is TRUE, this will be an array with one entry per timestep.

g3l_abundancedistribution compares abundance of stocks, g3l_catchdistribution compares fleet catch. Thus providing fleets is mandatory for g3l_catchdistribution, and an error for g3l_abundancedistribution.

obs_data and data aggregation

The obs_data data.frame, as well as providing the observation data to compare the model data against, controls the grouping of model data to compare to the observation data, by inspecting the MFDB column attributes produced by e.g. mfdb_sample_count.

Metadata columns describe the observation datapoint in that row. The columns should be from this list:

year

Required. Year for the data point. Gaps in years will result in no comparison for that year

step

Optional. If there is no step column, then the data is assumed to be yearly, and the model data for all timesteps will be summed before comparing.

Model timestep for the data point. Gaps in steps will result in no comparison for that year/step.

length

Optional. If missing all lengthgroups in the model will be summed to compare to the data.

The column can be a factor, as generated by cut(), e.g cut(raw_length, c(seq(0, 50, by = 10), Inf), right = FALSE) for an open-ended upper group.

The column can be character strings also formatted as factors as above. The column entries are assumed to be sorted in order and converted back to a factor.

If open_ended = c('lower', 'upper') was used when querying MFDB for the data, then the bottom/top length groups will be modified to start from zero or be infinite respectively.

Any missing lengthgroups (when there is otherwise data for that year/step) will be compared to zero.

age

Optional. If missing all age-groups (if any) in the model will be summed to compare to the data.

Model ages will be grouped by the same groupings as MFDB used, thus if the data was formed with a query age = mfdb_group(young = 1:3, old = 4:5), then the model data will similarly have 2 groups in it.

Any missing ages (when there is otherwise data for that year/step) will be compared to zero.

stock

Optional. If this and stock_re are missing all stocks in stocks will be summed to compare to the data.

The values in the stocks column should match the names of the stocks given in the stocks parameter. This column can be factor or character.

Any missing stocks (when there is otherwise data for that year/step) will be compared to zero.

stock_re

Optional. If this and stock are missing all stocks in stocks will be summed to compare to the data.

The values in the stocks column will be used as regular expressions to match the names of the stocks given in the stocks parameter. For example, '_mat_' will match both 'ghd_mat_f' and 'ghd_mat_m' and will be compared against the sum of the 2 stocks.

Any missing stocks (when there is otherwise data for that year/step) will be compared to zero.

fleet

Optional. If this and fleet_re are missing all fleets in fleets will be summed to compare to the data.

The values in the fleets column should match the names of the fleets given in the fleets parameter. This column can be factor or character.

Any missing fleets (when there is otherwise data for that year/step) will be compared to zero.

fleet_re

Optional. If this and fleet are missing all fleets in fleets will be summed to compare to the data.

The values in the fleets column will be used as regular expressions to match the names of the fleets given in the fleets parameter. For example, '_trawl_' will match both 'fleet_trawl_is' and 'fleet_trawl_no' and will be compared against the sum of the 2 fleets.

Any missing fleets (when there is otherwise data for that year/step) will be compared to zero.

area

Optional. If missing all areas in the model will be summed to compare to the data.

Unlike other columns, the MFDB grouping here is ignored (the areas it is grouping over aren't integer model areas). Instead, the area_group parameter should describe how to map from the area names used in the table to integer model areas.

For example, if area_group = list(north=1:2, south=3:5), then the area column of obs_data should contain either "north" or "south", and corresponding model data will be summed from integer model areas 1,2 and 3,4,5 respectively.

If area_group is not supplied, then we assume that obs_data area column will contain model area integers.

Any missing areas (when there is otherwise data for that year/step) will be compared to zero.

Data columns contain the observation data to compare. There should be at least one of:

number

If a number column appears in obs_data, then the stock abundance by individuals will be aggregated and compared to the obs_data number column.

weight

If a weight column appears in obs_data, then the total biomass of the stock will be aggregated and compared to the obs_data number column.

You can use g3_distribution_preview to see how your observation data will be converted into an array.

Value

g3l_distribution_sumofsquares

Returns a formula for use as function_f:

$$ \sum_{\it lengths} \Big( \frac{N_{tral}}{N_{tr}} - \frac{\nu_{tral}}{\nu_{tr}} \Big) ^2 $$

\(N_{tral}\)

Observation sample size for current time/area/age/length combination

\(\nu_{tral}\)

Model sample size for current time/area/age/length combination

\(N_{tr}\)

Total observation sample size for current time/area (or dimensions set in over)

\(\nu_{tr}\)

Total model sample size for current time/area (or dimensions set in over)

g3l_distribution_multinomial

Returns a formula for use as function_f:

$$ 2 ( \sum_{\it lengths} \log N_{tral}! - \log (\sum_{\it lengths} N_{tral})! - \sum_{\it lengths} ( N_{tral} \log min(\frac{\nu_{tral}}{\sum_{\it lengths} \nu_{tral}}, \frac{1}{l \epsilon}) ) ) $$

\(N_{tral}\)

Observation sample size for current time/area/age/length combination

\(\nu_{tral}\)

Model sample size for current time/area/age/length combination

\(l\)

Number of lengthgroups in sample

\(\epsilon\)

epsilon parameter

g3l_distribution_multivariate

Returns a formula for use as function_f, which calls TMB's SCALE(AR1(rho), sigma)(x), where rho and sigma are parameters, and x is defined as:

$$ \frac{N_{tral}}{N_{tr}} - \frac{\nu_{tral}}{\nu_{tr}} $$

\(N_{tral}\)

Observation sample size for current time/area/age/length combination

\(\nu_{tral}\)

Model sample size for current time/area/age/length combination

\(N_{tr}\)

Total observation sample size for current time/area (or dimensions set in over)

\(\nu_{tr}\)

Total model sample size for current time/area (or dimensions set in over)

For more information, see Autoregressive processes in the TMB book.

g3l_distribution_surveyindices_log

Returns a formula for use as function_f:

$$ \sum_{\it time} (\alpha + \beta \log N_{tral} - \log \nu_{tral})^2 $$

\(N_{tral}\)

Observation sample size for current time/area/age/length combination

\(\nu_{tral}\)

Model sample size for current time/area/age/length combination

\(\alpha\)

alpha parameter

\(\beta\)

beta parameter

If alpha or beta is not provided, then linear regression is performed on \(N\), \(\nu\) over time for each area/age/length combination. The used values will be stored in a cdist_nll_name_model__param array and reported after model run, whether calculated or hard-coded.

g3l_distribution_surveyindices_linear

Returns a formula for use as function_f:

$$ \sum_{\it lengths} (\alpha + \beta N_{tral} - \nu_{tral})^2 $$

\(N_{tral}\)

Observation sample size for current time/area/age/length combination

\(\nu_{tral}\)

Model sample size for current time/area/age/length combination

\(\alpha\)

alpha parameter

\(\beta\)

beta parameter

If alpha or beta is not provided, then linear regression is performed on \(N\), \(\nu\) over time for each area/age/length combination. The used values will be stored in a cdist_nll_name_model__param array and reported after model run, whether calculated or hard-coded.

g3l_distribution_sumofsquaredlogratios

The equivalent of gadget2's catchinkilos.

Returns a formula for use as function_f:

$$ \sum_{\it lengths} (log(N_{tral} + \epsilon) - log(\nu_{tral} + \epsilon))^2 $$

\(N_{tral}\)

Observation sample size for current time/area/age/length combination

\(\nu_{tral}\)

Model sample size for current time/area/age/length combination

\(\epsilon\)

epsilon parameter

g3l_abundancedistribution

An action (i.e. list of formula objects) that will...

  1. For all stocks, collect catch data into modelstock__num or modelstock__wgt, depending on the columns provided in obs_data

  2. Compare modelstock__num/wgt with obsstock__num/wgt, using function_f

The output of function_f is summed over all stock dimensions (age/area) and time and added to nll.

g3l_catchdistribution

An action (i.e. list of formula objects) that will...

  1. For all fleets and stocks combinations, collect catch data into modelstock__num or modelstock__wgt, depending on the columns provided in obs_data

  2. Compare modelstock__num/wgt with obsstock__num/wgt, using function_f

The output of function_f is summed over all stock dimensions (age/area) and time and added to nll.

g3_distribution_preview

The input obs_data formatted as an array, applying the same rules that g3l_*distribution will.

Examples

library(magrittr)
ling_imm <- g3_stock('ling_imm', seq(20, 156, 4)) %>% g3s_age(3, 10)
ling_mat <- g3_stock('ling_mat', seq(20, 156, 4)) %>% g3s_age(5, 15)
lln <- g3_fleet('lln')

# Invent a ldist.lln table for our tests
ldist.lln.raw <- data.frame(
    year = c(1999, 2000),
    age = sample(5:9, 100, replace = TRUE),
    length = sample(10:70, 100, replace = TRUE),
    number = 1,
    stringsAsFactors = FALSE)

# Group length into 10-long bins
# NB: The last 2 bins will be empty, but gadget3 will use the factor levels, include them as zero
# NB: Generally one would use mfdb::mfdb_sample_count() source and group data for you
ldist.lln.raw |> dplyr::group_by(
  year = year, age = age,
  length = cut(length, breaks = seq(10, 100, by = 10), right = FALSE)
) |> dplyr::summarise(number = sum(number), .groups = 'keep') -> ldist.lln

# Turn age into a factor, indicating all ages we should be interested in
ldist.lln$age <- factor(ldist.lln$age, levels = 5:15)

# We can see the results of this being turned into an array:
g3_distribution_preview(ldist.lln)
#> , , time = 1999
#> 
#>         age
#> length   5:5 6:6 7:7 8:8 9:9 10:10 11:11 12:12 13:13 14:14 15:15
#>   10:20    1   4   3   2   2    NA    NA    NA    NA    NA    NA
#>   20:30    2   2   2   3   1    NA    NA    NA    NA    NA    NA
#>   30:40    1   1   2   1   1    NA    NA    NA    NA    NA    NA
#>   40:50    2   3  NA   1  NA    NA    NA    NA    NA    NA    NA
#>   50:60    2   1  NA   1  NA    NA    NA    NA    NA    NA    NA
#>   60:70    1   4   2   2   3    NA    NA    NA    NA    NA    NA
#>   70:80   NA  NA  NA  NA  NA    NA    NA    NA    NA    NA    NA
#>   80:90   NA  NA  NA  NA  NA    NA    NA    NA    NA    NA    NA
#>   90:100  NA  NA  NA  NA  NA    NA    NA    NA    NA    NA    NA
#> 
#> , , time = 2000
#> 
#>         age
#> length   5:5 6:6 7:7 8:8 9:9 10:10 11:11 12:12 13:13 14:14 15:15
#>   10:20    1   2   2   1   2    NA    NA    NA    NA    NA    NA
#>   20:30    2   1   1  NA   2    NA    NA    NA    NA    NA    NA
#>   30:40    2   1   1   1   3    NA    NA    NA    NA    NA    NA
#>   40:50    3   1   1  NA   3    NA    NA    NA    NA    NA    NA
#>   50:60    1   1   3   2   2    NA    NA    NA    NA    NA    NA
#>   60:70    5  NA   1   4   1    NA    NA    NA    NA    NA    NA
#>   70:80   NA  NA  NA  NA  NA    NA    NA    NA    NA    NA    NA
#>   80:90   NA  NA  NA  NA  NA    NA    NA    NA    NA    NA    NA
#>   90:100  NA  NA  NA  NA  NA    NA    NA    NA    NA    NA    NA
#> 

likelihood_actions <- list(
  g3l_catchdistribution(
    'ldist_lln',
    ldist.lln,
    fleets = list(lln),
    stocks = list(ling_imm, ling_mat),
    g3l_distribution_sumofsquares()))

# Make an (incomplete) model using the action, extract the observation array
fn <- suppressWarnings(g3_to_r(likelihood_actions))
environment(fn)$cdist_sumofsquares_ldist_lln_obs__num
#> , , time = 1999
#> 
#>         age
#> length   5:5 6:6 7:7 8:8 9:9 10:10 11:11 12:12 13:13 14:14 15:15
#>   10:20    1   4   3   2   2     0     0     0     0     0     0
#>   20:30    2   2   2   3   1     0     0     0     0     0     0
#>   30:40    1   1   2   1   1     0     0     0     0     0     0
#>   40:50    2   3   0   1   0     0     0     0     0     0     0
#>   50:60    2   1   0   1   0     0     0     0     0     0     0
#>   60:70    1   4   2   2   3     0     0     0     0     0     0
#>   70:80    0   0   0   0   0     0     0     0     0     0     0
#>   80:90    0   0   0   0   0     0     0     0     0     0     0
#>   90:100   0   0   0   0   0     0     0     0     0     0     0
#> 
#> , , time = 2000
#> 
#>         age
#> length   5:5 6:6 7:7 8:8 9:9 10:10 11:11 12:12 13:13 14:14 15:15
#>   10:20    1   2   2   1   2     0     0     0     0     0     0
#>   20:30    2   1   1   0   2     0     0     0     0     0     0
#>   30:40    2   1   1   1   3     0     0     0     0     0     0
#>   40:50    3   1   1   0   3     0     0     0     0     0     0
#>   50:60    1   1   3   2   2     0     0     0     0     0     0
#>   60:70    5   0   1   4   1     0     0     0     0     0     0
#>   70:80    0   0   0   0   0     0     0     0     0     0     0
#>   80:90    0   0   0   0   0     0     0     0     0     0     0
#>   90:100   0   0   0   0   0     0     0     0     0     0     0
#> 

# Apply age-reading error matrix to model data
more_likelihood_actions <- list(
  g3l_catchdistribution(
    'ldist_lln_readerror',
    ldist.lln,
    fleets = list(lln),
    stocks = list(ling_imm, ling_mat),
    transform_fs = list(age = g3_formula(
      g3_param_array('reader1matrix', value = diag(5))[g3_idx(preage), g3_idx(age)]
      )),
    g3l_distribution_sumofsquares()))

# Apply per-stock age-reading error matrix to model data
more_likelihood_actions <- list(
  g3l_catchdistribution(
    'ldist_lln_readerror',
    ldist.lln,
    fleets = list(lln),
    stocks = list(ling_imm, ling_mat),
    transform_fs = list(age = g3_formula(stock_switch(stock,
      ling_imm = g3_param_array('imm_readermatrix',
          value = diag(ling_imm__maxage - ling_imm__minage + 1)
          )[ling_imm__preage_idx, ling_imm__age_idx],
      ling_mat = g3_param_array('mat_readermatrix',
          value = diag(ling_mat__maxage - ling_mat__minage + 1)
          )[ling_mat__preage_idx, ling_mat__age_idx],
      unused = 0))),
    g3l_distribution_sumofsquares()))