Gadget3 likelihood actions for sparse data

Compare model predictions against a set of sparse data points

g3l_sparsesample_linreg(
        fit = c('log', 'linear'),
        slope = 1,
        intercept = NULL )

g3l_sparsesample_sumsquares(
        weighting = "model_stddev" )

g3l_sparsesample(
        nll_name,
        obs_df,
        stocks,
        measurement_f = quote( wgt ),
        function_f = g3l_sparsesample_linreg(),
        predstocks = list(),
        area_group = NULL,
        weight = g3_parameterized(paste(
            if (length(predstocks) > 0) "csparse" else "asparse",
            function_f_name,
            nll_name,
            "weight",
            sep = "_"), optimise = FALSE, value = 1),
        run_at = g3_action_order$likelihood )

Arguments

slope, intercept: formula substituted into surveyindices calcuations to fix slope/intercept of linear regression, or NULL if not fixed. See below.
fit: Is the fit 'log' or 'linear'? See below.
weighting: Weighting applied to sum-of-squares. One of "model_stddev", "obs_stddev" or a formula.
nll_name: Character string, used to define the variable name for obsstock and modelstock. By default set to (asparse|csparse)_(name_of_function_f)_(nll_name)_weight.
obs_df: Data.frame of observation data. See details.
stocks: A list of g3_stock objects to collect sparsesample data for, depending if stocks were provided.
measurement_f: formula to derive the model's equivalent predicted value for a data point. You can use wgt to refer to weight of matching individuals, length to refer to length of matching individuals.
function_f: A formula to compare obs_df to predicted values generated via transform_f and generate nll, defined by one of the g3l_sparsesample_* functions.
predstocks: A list of g3_stock predator or fleet objects. If present, we will compare against the model predicted catch. Without (the default), we compare against overall abundance.
area_group: List mapping area names used in obs_df to integer model areas, most likely generated by g3_areas.
weight: Weighting applied to this likelihood component. Default is a g3_param that defaults to 1, allowing weights to be altered without recompiling.
run_at: Integer order that actions will be run within model, see g3_action_order.

Details

The actions will define the following variables in your model, which could be reported with g3a_report_history:

nll_sp(abund|catch)_name__obs_mean: Observation mean, the mean column from obs_df
nll_sp(abund|catch)_name__obs_stddev: Observation standard deviation, the stddev column from obs_df
nll_sp(abund|catch)_name__obs_n: Observation number, the number column from obs_df
nll_sp(abund|catch)_name__model_sum: The corresponding model prediction vector, total datapoints. __model_sum / __model_n for the mean
nll_sp(abund|catch)_name__model_sqsum: The corresponding model prediction vector, sqared-sum datapoints.
nll_sp(abund|catch)_name__model_n: The number of data points at each point in the model prediction vector, if predstocks set this is the number of individuals caught matching the datapoint (length/age/...), otherwise abundance of individuals matching the datapoint.

obs_df format

data.frame of observation data. Unlike g3l_abundancedistribution, gaps and sparse data is accepted, and gaps will not be filled with zero.

For each row in the table, all matching predictions are aggregated. Aggregation columns include:

year: Required. The year the sample is from
step: Optional. The timestep/season the sample is from
area: Optional. Only aggregate predicted values from given area
age: Optional. Only aggregate predicted values with given age
length: Optional. Only aggregate predicted values with given length (matches nearest lengthgroup)

So, a row with "year=1998,age=4" will be compared against age 4 individuals of all lengths in 1998, step 1 & 2. A row with "year=2004,step=1,age=2,length=19" will be compared against individuals of age 4, length 10..20, in winter 2004.

The observation data is provided in the following columns:

mean: Required. Mean value at this data point
number: Optional. Number of data points, defaults to 1
stddev: Optional. Observed standard deviation (only required if weighting = "obs_stddev")

Value

g3l_sparsesample_linreg

Returns a formula for use as function_f:

If fit = "log": $$ \sum_{\it i}^{rows} (\alpha + \beta \log N_{i} - \log \frac{\nu_{i}}{P_{i}})^2 $$ If fit = "linear": $$ \sum_{\it i}^{rows} (\alpha + \beta N_{i} - \frac{\nu_{i}}{P_{i}})^2 $$

$N_{i}$: "mean" column from obs_df
$\nu_{i}$: Total predicted values for all data points, i.e. nll_spabund_name__model_sum
$P_{i}$: Number of data points, i.e. nll_spabund_name__model_n
$\alpha$: intercept parameter, defaults to 1, i.e. fixed slope
$\beta$: slope parameter, defaults to NULL, i.e. linear regression performed to find optimal value

If either alpha or beta is not provided, then linear regression is performed on $N$ vs $\nu$ for each value in table, and the optimal value used for each.

g3l_sparsesample_sumsquares

Returns a formula for use as function_f:

$$ \sum_{\it i}^{rows} w (\frac{\nu_{i}}{P_{i}} - N_{i})^2 $$

$N_{i}$

"mean" column from obs_df

$\nu_{i}$

Total predicted values, i.e. nll_spabund_name__model_sum

$P_{i}$

Number of data points, i.e. nll_spabund_name__model_n

$w$

weighting parameter, either:

$1 / \sigma^2$, using stddev of model predicted values if weighting = "model_stddev"
$1 / \sigma^2$, using stddev column from obs_df if weighting = "obs_stddev"
A custom forumla provided for weighting

Examples

st <- g3_stock("fish", c(10, 20, 30)) %>% g3s_age(3,5)

# Generate some random sparsesample samples
obs_df <- data.frame(
    # NB: No 1993, we don't have any samples for that year
    year = rep(c(1990, 1991, 1992, 1994), each = 2),
    step = 1:2 )
obs_df$age = floor(runif(nrow(obs_df), min = 3, max = 5.1))
obs_df$length = floor(runif(nrow(obs_df), min = 10, max = 50))
obs_df$mean = runif(nrow(obs_df), min = 10, max = 1000)

actions <- list(
    g3a_time(1990, 1994, c(6,6)),
    # Use otherfood to populate abundance / mean weight
    g3a_otherfood(st,
        quote( age * 100 + stock__minlen ),
        quote( cur_year * 1e5 + cur_step * 1e4 + 0 * stock__minlen ) ),
    g3l_sparsesample(
        "bt",
        obs_df,
        list(st),
        measurement_f = g3_formula(
          # Derive blubber thickness from length/weight
          ((wgt/(wmax.a * length^wmax.b) - 0.5) * 100 - 4.44) / (5693 * (length/wgt)^0.5),
          wmax.a = g3_parameterized("wmax.a", by_stock = TRUE),
          wmax.b = g3_parameterized("wmax.b", by_stock = TRUE),
          end = NULL ),
        function_f = g3l_sparsesample_linreg(fit = "linear") ),
    NULL )

model_fn <- g3_to_r(c(actions, list(
    g3a_report_detail(actions),  # TODO: Not reporting anything useful
    NULL )))
r <- attributes(model_fn())
colSums(r$dstart_fish__num)  # TODO: Report something related
#>       time
#> age    1990-01 1990-02 1991-01 1991-02 1992-01 1992-02 1993-01 1993-02 1994-01
#>   age3     960     960     960     960     960     960     960     960     960
#>   age4    1260    1260    1260    1260    1260    1260    1260    1260    1260
#>   age5    1560    1560    1560    1560    1560    1560    1560    1560    1560
#>       time
#> age    1994-02
#>   age3     960
#>   age4    1260
#>   age5    1560