Storing & grouping by arbitrary populations • mfdb

The following examples use the following table_string helper to succintly define tables:

# Convert a string into a data.frame
table_string <- function (text, ...) read.table(
    text = text,
    blank.lines.skip = TRUE,
    header = TRUE,
    stringsAsFactors = FALSE,
    ...)

Firstly, connect to a database and set up some areas/divisions:

mdb <- mfdb(tempfile(fileext = '.duckdb'))
mfdb_import_area(mdb, table_string('
name  division size
45G01     divA   10
45G02     divA  200
45G03     divB  400
'))

Importing data

Populations are arbitrary groupings for defining logical stocks that can’t be derived from other MFDB data.

As with other metadata, we have to import valid values before using:

mfdb_import_population_taxonomy(mdb, table_string('
name    description                 t_group
ns  "Northern Shrimp"           ns
ns_s    "Northern Shrimp in Skjalfandi"     ns
ns_a    "Northern Shrimp in Arnarfjordur"   ns
ns_i    "Northern Shrimp in Isafjardardjup" ns
as  "Aesop Shrimp"              as
as_s    "Aesop Shrimp in Skjalfandi"        as
'))

Notice that we have used the t_group column to define groupings of within our population groups. This means that the ns group will include all samples from ns, ns_s`,ns_a,ns_i``.

Now we can import data that uses these groupings:

mfdb_import_survey(mdb, data_source = "x",
table_string("
year    month   areacell   species population length  count
2019    1       45G01      PRA     ns_s     10      285
2019    1       45G01      PRA     ns_s     20      273

2019    1       45G01      PRA     ns_a     10      299
2019    1       45G01      PRA     ns_a     20      252

2019    1       45G01      PRA     ns_i     10      193
2019    1       45G01      PRA     ns_i     20      322
"))

Querying data

We can now use the mfdb_sample_* functions to select this data back out again. We can group and filter by any of the tow attributes. We can query for individual fjords as well as the whole group:

agg_data <- mfdb_sample_count(mdb, c('population', 'length'), list(
        population = mfdb_group(ns_s = 'ns_s', ns = 'ns'),
        length = mfdb_unaggregated()))
agg_data

## $`0.0.0.0.0`
##   year step area population length number
## 1  all  all  all         ns     10    777
## 2  all  all  all         ns     20    847
## 3  all  all  all       ns_s     10    285
## 4  all  all  all       ns_s     20    273

mfdb_disconnect(mdb)
options(unittest.output = NULL)