Compare model predictions against a set of sparse data points

g3l_sparsesample_linreg(
        fit = c('log', 'linear'),
        slope = 1,
        intercept = NULL )

g3l_sparsesample_sumsquares(
        weighting = "model_stddev" )

g3l_sparsesample(
        nll_name,
        obs_df,
        stocks,
        measurement_f = quote( wgt ),
        function_f = g3l_sparsesample_linreg(),
        predstocks = list(),
        area_group = NULL,
        weight = g3_parameterized(paste(
            if (length(predstocks) > 0) "csparse" else "asparse",
            function_f_name,
            nll_name,
            "weight",
            sep = "_"), optimise = FALSE, value = 1),
        run_at = g3_action_order$likelihood )

Arguments

slope, intercept

formula substituted into surveyindices calcuations to fix slope/intercept of linear regression, or NULL if not fixed. See below.

fit

Is the fit 'log' or 'linear'? See below.

weighting

Weighting applied to sum-of-squares. One of "model_stddev", "obs_stddev" or a formula.

nll_name

Character string, used to define the variable name for obsstock and modelstock. By default set to (asparse|csparse)_(name_of_function_f)_(nll_name)_weight.

obs_df

Data.frame of observation data. See details.

stocks

A list of g3_stock objects to collect sparsesample data for, depending if stocks were provided.

measurement_f

formula to derive the model's equivalent predicted value for a data point. You can use wgt to refer to weight of matching individuals, length to refer to length of matching individuals.

function_f

A formula to compare obs_df to predicted values generated via transform_f and generate nll, defined by one of the g3l_sparsesample_* functions.

predstocks

A list of g3_stock predator or fleet objects. If present, we will compare against the model predicted catch. Without (the default), we compare against overall abundance.

area_group

List mapping area names used in obs_df to integer model areas, most likely generated by g3_areas.

weight

Weighting applied to this likelihood component. Default is a g3_param that defaults to 1, allowing weights to be altered without recompiling.

run_at

Integer order that actions will be run within model, see g3_action_order.

Details

The actions will define the following variables in your model, which could be reported with g3a_report_history:

nll_sp(abund|catch)_name__obs_mean

Observation mean, the mean column from obs_df

nll_sp(abund|catch)_name__obs_stddev

Observation standard deviation, the stddev column from obs_df

nll_sp(abund|catch)_name__obs_n

Observation number, the number column from obs_df

nll_sp(abund|catch)_name__model_sum

The corresponding model prediction vector, total datapoints. __model_sum / __model_n for the mean

nll_sp(abund|catch)_name__model_sqsum

The corresponding model prediction vector, sqared-sum datapoints.

nll_sp(abund|catch)_name__model_n

The number of data points at each point in the model prediction vector, if predstocks set this is the number of individuals caught matching the datapoint (length/age/...), otherwise abundance of individuals matching the datapoint.

obs_df format

data.frame of observation data. Unlike g3l_abundancedistribution, gaps and sparse data is accepted, and gaps will not be filled with zero.

For each row in the table, all matching predictions are aggregated. Aggregation columns include:

year

Required. The year the sample is from

step

Optional. The timestep/season the sample is from

area

Optional. Only aggregate predicted values from given area

age

Optional. Only aggregate predicted values with given age

length

Optional. Only aggregate predicted values with given length (matches nearest lengthgroup)

So, a row with "year=1998,age=4" will be compared against age 4 individuals of all lengths in 1998, step 1 & 2. A row with "year=2004,step=1,age=2,length=19" will be compared against individuals of age 4, length 10..20, in winter 2004.

The observation data is provided in the following columns:

mean

Required. Mean value at this data point

number

Optional. Number of data points, defaults to 1

stddev

Optional. Observed standard deviation (only required if weighting = "obs_stddev")

Value

g3l_sparsesample_linreg

Returns a formula for use as function_f:

If fit = "log": $$ \sum_{\it i}^{rows} (\alpha + \beta \log N_{i} - \log \frac{\nu_{i}}{P_{i}})^2 $$ If fit = "linear": $$ \sum_{\it i}^{rows} (\alpha + \beta N_{i} - \frac{\nu_{i}}{P_{i}})^2 $$

\(N_{i}\)

"mean" column from obs_df

\(\nu_{i}\)

Total predicted values for all data points, i.e. nll_spabund_name__model_sum

\(P_{i}\)

Number of data points, i.e. nll_spabund_name__model_n

\(\alpha\)

intercept parameter, defaults to 1, i.e. fixed slope

\(\beta\)

slope parameter, defaults to NULL, i.e. linear regression performed to find optimal value

If either alpha or beta is not provided, then linear regression is performed on \(N\) vs \(\nu\) for each value in table, and the optimal value used for each.

g3l_sparsesample_sumsquares

Returns a formula for use as function_f:

$$ \sum_{\it i}^{rows} w (\frac{\nu_{i}}{P_{i}} - N_{i})^2 $$

\(N_{i}\)

"mean" column from obs_df

\(\nu_{i}\)

Total predicted values, i.e. nll_spabund_name__model_sum

\(P_{i}\)

Number of data points, i.e. nll_spabund_name__model_n

\(w\)

weighting parameter, either:

  1. \(1 / \sigma^2\), using stddev of model predicted values if weighting = "model_stddev"

  2. \(1 / \sigma^2\), using stddev column from obs_df if weighting = "obs_stddev"

  3. A custom forumla provided for weighting

Examples

st <- g3_stock("fish", c(10, 20, 30)) %>% g3s_age(3,5)

# Generate some random sparsesample samples
obs_df <- data.frame(
    # NB: No 1993, we don't have any samples for that year
    year = rep(c(1990, 1991, 1992, 1994), each = 2),
    step = 1:2 )
obs_df$age = floor(runif(nrow(obs_df), min = 3, max = 5.1))
obs_df$length = floor(runif(nrow(obs_df), min = 10, max = 50))
obs_df$mean = runif(nrow(obs_df), min = 10, max = 1000)

actions <- list(
    g3a_time(1990, 1994, c(6,6)),
    # Use otherfood to populate abundance / mean weight
    g3a_otherfood(st,
        quote( age * 100 + stock__minlen ),
        quote( cur_year * 1e5 + cur_step * 1e4 + 0 * stock__minlen ) ),
    g3l_sparsesample(
        "bt",
        obs_df,
        list(st),
        measurement_f = g3_formula(
          # Derive blubber thickness from length/weight
          ((wgt/(wmax.a * length^wmax.b) - 0.5) * 100 - 4.44) / (5693 * (length/wgt)^0.5),
          wmax.a = g3_parameterized("wmax.a", by_stock = TRUE),
          wmax.b = g3_parameterized("wmax.b", by_stock = TRUE),
          end = NULL ),
        function_f = g3l_sparsesample_linreg(fit = "linear") ),
    NULL )

model_fn <- g3_to_r(c(actions, list(
    g3a_report_detail(actions),  # TODO: Not reporting anything useful
    NULL )))
r <- attributes(model_fn())
colSums(r$dstart_fish__num)  # TODO: Report something related
#>       time
#> age    1990-01 1990-02 1991-01 1991-02 1992-01 1992-02 1993-01 1993-02 1994-01
#>   age3     960     960     960     960     960     960     960     960     960
#>   age4    1260    1260    1260    1260    1260    1260    1260    1260    1260
#>   age5    1560    1560    1560    1560    1560    1560    1560    1560    1560
#>       time
#> age    1994-02
#>   age3     960
#>   age4    1260
#>   age5    1560