MareFrame DB dplyr interface — mfdb

Use mfdb tables with dplyr

mfdb_dplyr_table(mdb, table_name, include_cols = all_cols)
mfdb_dplyr_survey_index(mdb, include_cols = all_cols)
mfdb_dplyr_division(mdb, include_cols = all_cols)
mfdb_dplyr_sample(mdb, include_cols = all_cols)
mfdb_dplyr_predator(mdb, include_cols = all_cols)
mfdb_dplyr_prey(mdb, include_cols = all_cols)

Arguments

mdb: An object created by mfdb()
table_name: A table name to query in
include_cols: Any additonal columns to include in output, see details.

Details

Warning: Whilst these might be handy for exploration, there is no guarantee that code using these will continue to work from one version of MFDB to the next.

There is one function for each measurement table. By default every possible taxonomy column is included. However this is somewhat inefficient if you do not require the data, in which case specify the columns requred with include_cols. See mfdb::mfdb_taxonomy_tables for possible values.

To query taxonomy tables, use mfdb_dplyr_table, which works for any supplied table name. See mfdb::mfdb_taxonomy_tables for possible values for table_name.

Value

A dplyr table object, for you to do as you please.

Examples

mdb <- mfdb(tempfile(fileext = '.duckdb'))
#> 2022-11-16 12:34:29 INFO:mfdb:Creating schema from scratch
#> 2022-11-16 12:34:30 INFO:mfdb:Taxonomy market_category no updates to make
#> 2022-11-16 12:34:30 INFO:mfdb:Schema up-to-date

# Include as many columns as possible
mfdb_dplyr_sample(mdb)
#> # Source:   SQL [0 x 64]
#> # Database: DuckDB 0.5.1 [unknown@Linux 5.15.0-1022-azure:R 4.2.2//tmp/RtmpZM4drC/filef62e42d6ae2a.duckdb]
#> # … with 64 variables: year <int>, month <int>, age <dbl>, length <dbl>,
#> #   length_var <dbl>, length_min <int>, weight <dbl>, weight_var <dbl>,
#> #   liver_weight <dbl>, liver_weight_var <dbl>, gonad_weight <dbl>,
#> #   gonad_weight_var <dbl>, stomach_weight <dbl>, stomach_weight_var <dbl>,
#> #   gutted_weight <dbl>, gutted_weight_var <dbl>, count <dbl>, areacell <chr>,
#> #   areacell_size <dbl>, areacell_depth <dbl>, data_source <chr>, gear <chr>,
#> #   gear_mesh_size <dbl>, gear_mesh_size_min <dbl>, gear_mesh_size_max <dbl>, …

# Only include 'data_source' and 'species' columns, as well as measurements
mfdb_dplyr_sample(mdb, c('data_source', 'species'))
#> # Source:   SQL [0 x 19]
#> # Database: DuckDB 0.5.1 [unknown@Linux 5.15.0-1022-azure:R 4.2.2//tmp/RtmpZM4drC/filef62e42d6ae2a.duckdb]
#> # … with 19 variables: year <int>, month <int>, age <dbl>, length <dbl>,
#> #   length_var <dbl>, length_min <int>, weight <dbl>, weight_var <dbl>,
#> #   liver_weight <dbl>, liver_weight_var <dbl>, gonad_weight <dbl>,
#> #   gonad_weight_var <dbl>, stomach_weight <dbl>, stomach_weight_var <dbl>,
#> #   gutted_weight <dbl>, gutted_weight_var <dbl>, count <dbl>,
#> #   data_source <chr>, species <chr>

# Query the sampling_type table
mfdb_dplyr_table(mdb, 'sampling_type')
#> # Source:   SQL [0 x 3]
#> # Database: DuckDB 0.5.1 [unknown@Linux 5.15.0-1022-azure:R 4.2.2//tmp/RtmpZM4drC/filef62e42d6ae2a.duckdb]
#> # … with 3 variables: sampling_type <chr>, t_group <chr>, description <chr>

mfdb_disconnect(mdb)