MareFrame DB queries — mfdb

Aggregate data from the database in a variety of ways

mfdb_area_size(mdb, params)
mfdb_area_size_depth(mdb, params)
mfdb_temperature(mdb, params)
mfdb_survey_index_mean(mdb, cols, params, scale_index = NULL)
mfdb_survey_index_total(mdb, cols, params, scale_index = NULL)
mfdb_sample_count(mdb, cols, params, scale_index = NULL)
mfdb_sample_meanlength(mdb, cols, params, scale_index = NULL)
mfdb_sample_meanlength_stddev(mdb, cols, params, scale_index = NULL)
mfdb_sample_totalweight(mdb, cols, params, measurements = c('overall'))
mfdb_sample_meanweight(mdb, cols, params, scale_index = NULL,
                       measurements = c('overall'))
mfdb_sample_meanweight_stddev(mdb, cols, params, scale_index = NULL,
                              measurements = c('overall'))
mfdb_sample_rawdata(mdb, cols, params, scale_index = NULL)
mfdb_sample_scaled(mdb, cols, params, abundance_scale = NULL, scale = 'tow_length')
mfdb_stomach_preycount(mdb, cols, params)
mfdb_stomach_preymeanlength(mdb, cols, params)
mfdb_stomach_preymeanweight(mdb, cols, params)
mfdb_stomach_preyweightratio(mdb, cols, params)
mfdb_stomach_presenceratio(mdb, cols, params)

Arguments

mdb: An object created by mfdb()
cols: Any additonal columns to group by, see details.
params: A list of parameters, see details.
scale_index: Optional. survey_index used to scale results before aggregation, either "tow_length", "area_size" or from mfdb_import_survey_index
abundance_scale: Optional. Same as scale_index
scale: Optional. A scale to apply to the resulting values, e.g. 'tow_length'
measurements: Optional, default 'overall'. A vector of measurement names to use, one of overall, liver, gonad, stomach

Details

The items in the params list either restrict data that is returned, or groups data if they are also in the cols vector, or are 'year', 'timestep', or 'area'.

If you are grouping by the column, params should contain one of the following:

NULL: Don't do any grouping, instead put 'all' in the resulting column. For example, age = NULL results in "all".
character / numeric vector: Aggregate all samples together where they match. For example, year = 1990:2000 results in 1990, ... , 2000.
mfdb_unaggregated(): Don't do any aggregation for this column, return all possible values.
mfdb_group(): Group several discrete items together. For example, age = mfdb_group(young = 1:3, old = 4:5) results in "young" and "old".
mfdb_interval(): Group irregular ranges together. For example, length = mfdb_interval('len', c(0, 10, 100, 1000)) results in "len0", "len10", "len100" (1000 is the upper bound to len100).
mfdb_step_interval(): Group regular ranges together. For example, length = mfdb_step_interval('len', to = 100, by = 10) results in "len0", "len10", ... , "len90".

In addition, params can contain other arguments to purely restrict the data that is returned.

institute: A vector of institute names / countries, see mfdb::institute for possible values
gear: A vector of gear names, see mfdb::gear for possible values
vessel: A vector of vessel names, see mfdb::vessel for possible values
sampling_type: A vector of sampling_type names, see mfdb::sampling_type for possible values
species: A vector of species names, see mfdb::species for possible values
sex: A vector of sex names, see mfdb::sex for possible values

To save specifying the same items repeatedly, you can use list concatenation to keep some defaults, for example:


defaults <- list(year = 1998:2000)
mfdb_sample_meanlength(mdb, c('age'), c(list(), defaults))

scale_index allows you to scale samples before aggregation. If it contains the name of a survey index (see mfdb_import_survey_index), then any counts will be scaled by the value for that areacell before and used in aggregation / weighted averages. As a special case, you can use "tow_length" to to scale counts by the tow length.

Value

All will return a list of data.frame objects. If there was no bootstrapping requested, there will be only one. Otherwise, there will be one for each sample.

The columns of these data frames depends on the function called.

mfdb_area_size

Returns area, (total area) size

mfdb_area_size_depth

Returns area, (total area) size, mean depth, weighted by area size

mfdb_temperature

Returns year, step, area, (mean) temperature

mfdb_survey_index_mean

Returns year, step, area, (group cols), (mean) survey index

mfdb_survey_index_total

Returns year, step, area, (group cols), (sum) survey index

mfdb_sample_count

Returns year, step, area, (group cols), number (i.e sum of count)

mfdb_sample_meanlength

Return year, step, area, (group cols), number (i.e sum of count), mean (length)

mfdb_sample_meanlength_stddev

As mfdb_sample_meanlength, but also returns std. deviation.

mfdb_sample_totalweight

Returns year,step,area,(group cols),total (weight of group)

mfdb_sample_meanweight

Returns year, step, area, (group cols), number (i.e sum of count), mean (weight)

mfdb_sample_meanweight_stddev

As mfdb_sample_meanweight, but also returns std. deviation.

mfdb_sample_rawdata

Returns year,step,area,(group cols),number of samples, raw_weight and raw_length.

NB: No grouping of results is performed, instead all matching table entries are returned

mfdb_sample_scaled

Returns year, step, area, (group cols), number (i.e. sum of count, scaled by tow_length), mean_weight (scaled by tow_length)

mfdb_stomach_preycount

Returns year, step, area, (group cols), number (of prey found in stomach)

mfdb_stomach_preymeanlength

Returns year, step, area, (group cols), number (of prey found in stomach), mean_length (of prey found in stomach). NB: Entries where count is NA (i.e. totals) are ignored with this function.

mfdb_stomach_preymeanweight

Returns year, step, area, (group cols), number (of unique stomachs in group), mean_weight (per unique stomach).

mfdb_stomach_preyweightratio

Returns year, step, area, (group cols), ratio (of selected prey in stomach to all prey by weight)

mfdb_stomach_presenceratio

Returns year, step, area, (group cols), ratio (of selected prey in stomach to all prey by count)

Examples

mdb <- mfdb(tempfile(fileext = '.duckdb'))
#> 2026-01-25 17:20:59.349771 INFO:mfdb:Creating schema from scratch
#> 2026-01-25 17:20:59.762378 INFO:mfdb:Taxonomy market_category no updates to make
#> 2026-01-25 17:21:00.301682 INFO:mfdb:Schema up-to-date

# Define 2 areacells of equal size
mfdb_import_area(mdb, data.frame(name=c("divA", "divB"), size=1))

# Make up some samples
samples <- expand.grid(
    year = 1998,
    month = c(1:12),
    areacell = c("divA", "divB"),
    species = 'COD',
    age = c(1:5),
    length = c(0,40,80))
samples$count <- runif(nrow(samples), 20, 90)
mfdb_import_survey(mdb, data_source = "x", samples)

# Query numbers by age and length
agg_data <- mfdb_sample_count(mdb, c('age', 'length'), list(
    length = mfdb_interval("len", seq(0, 500, by = 30)),
    age = mfdb_group('young' = c(1,2), old = 3),
    year = c(1998)))
agg_data
#> $`0.0.0.0.0`
#>   year step area   age length   number
#> 1 1998  all  all   old   len0 1280.437
#> 2 1998  all  all   old  len30 1345.882
#> 3 1998  all  all   old  len60 1308.701
#> 4 1998  all  all young   len0 2612.921
#> 5 1998  all  all young  len30 2454.455
#> 6 1998  all  all young  len60 2468.534
#> 

# Use in a catchdistribution likelihood component
gadget_dir_write(gadget_directory(tempfile()), gadget_likelihood_component("catchdistribution",
        name = "cdist",
        weight = 0.9,
        data = agg_data[[1]],
        area = attr(agg_data[[1]], "area"),
        age = attr(agg_data[[1]], "age")))

mfdb_disconnect(mdb)