MareFrame Data Import functions — mfdb_import

Functions to import data into MareFrame DB

mfdb_import_temperature(mdb, data_in)
    mfdb_import_survey(mdb, data_in, data_source = 'default_sample')
    mfdb_import_survey_index(mdb, data_in, data_source = 'default_index')
    mfdb_import_stomach(mdb, predator_data, prey_data, data_source = "default_stomach")

Arguments

mdb: Database connection created by mfdb().
data_in, predator_data, prey_data: A data.frame of survey data to import, see details.
data_source: A name for this data, e.g. the filename it came from. Used so you can replace it later without disturbing other data.

Details

All functions will replace existing data in the case study with new data, unless you specify a data_source, in which case then only existing data with the same data_source will be replaced.

If you want to remove the data, import empty data.frames with the same data_source.

mfdb_import_temperature imports temperature time-series data for areacells. The data_in should be a data.frame with the following columns:

id: A numeric ID for this areacell (will be combined with the case study number internally)
year: Required. Year each sample was taken, e.g. c(2000,2001)
month: Required. Month (1--12) each sample was taken, e.g. c(1,12)
areacell: Required. Areacell sample was taken within
temperature: The temperature at given location/time

mfdb_import_survey imports institution surveys and commercial sampling for your case study. The data_in should be a data.frame with the following columns:

institute: Optional. An institute name, see mfdb::institute for possible values
gear: Optional. Gear name, see mfdb::gear for possible values
vessel: Optional. Vessel defined previously with mfdb_import_vessel_taxonomy(...)
tow: Optional. Tow defined previously with mfdb_import_tow_taxonomy(...)
sampling_type: Optional. A sampling_type, see mfdb::sampling_type for possible values
year: Required. Year each sample was taken, e.g. c(2000,2001)
month: Required. Month (1--12) each sample was taken, e.g. c(1,12)
areacell: Required. Areacell sample was taken within
species: Optional, default c(NA). Species of sample, see mfdb::species for possible values
age: Optional, default c(NA). Age of sample, or mean age
sex: Optional, default c(NA). Sex of sample, see mfdb::sex for possible values
length: Optional, default c(NA). Length of sample / mean length of all samples
length_var: Optional, default c(NA). Sample variance, if data is already aggregated
length_min: Optional, default c(NA). Minimum theoretical length, if data is already aggregated
weight: Optional, default c(NA). Weight of sample / mean weight of all samples
weight_var: Optional, default c(NA). Sample variance, if data is already aggregated
weight_total: Optional, default c(NA). Total weight of all samples, can be used with count = NA to represent an unknown number of samples
liver_weight: Optional, default c(NA). Weight of sample / mean liver weight of all samples
liver_weight_var: Optional, default c(NA). Sample variance, if data is already aggregated
gonad_weight: Optional, default c(NA). Weight of sample / mean gonad weight of all samples
gonad_weight_var: Optional, default c(NA). Sample variance, if data is already aggregated
stomach_weight: Optional, default c(NA). Weight of sample / mean stomach weight of all samples
stomach_weight_var: Optional, default c(NA). Sample variance, if data is already aggregated
count: Optional, default c(1). Number of samples this row represents (i.e. if the data is aggregated)

mfdb_import_survey_index adds indicies that can be used as abundance information, for example. Before using mfdb_import_survey_index, make sure that the index_type you intend to use exists by using mfdb_import_cs_taxonomy. The data_in should be a data.frame with the following columns:

index_type: Required. the name of the index data you are storing, e.g. 'acoustic'
year: Required. Year each sample was taken, e.g. c(2000,2001)
month: Required. Month (1--12) each sample was taken, e.g. c(1,12)
areacell: Required. Areacell sample was taken within
value: Value of the index at this point in space/time

mfdb_import_stomach imports data on predators and prey. The predator and prey data are stored separately, however they should be linked by the stomach_name column. If a prey has a stomach name that doesn't match a predator, then an error will be returned.

The predator_data should be a data.frame with the following columns:

stomach_name: Required. An arbitary name that provides a link between the predator and prey tables
institute: Optional. An institute name, see mfdb::institute for possible values
gear: Optional. Gear name, see mfdb::gear for possible values
vessel: Optional. Vessel defined previously with mfdb_import_vessel_taxonomy(mdb, ...)
tow: Optional. Tow defined previously with mfdb_import_tow_taxonomy(...)
sampling_type: Optional. A sampling_type, see mfdb::sampling_type for possible values
year: Required. Year each sample was taken, e.g. c(2000,2001)
month: Required. Month (1--12) each sample was taken, e.g. c(1,12)
areacell: Required. Areacell sample was taken within
species: Optional, default c(NA). Species of sample, see mfdb::species for possible values
age: Optional, default c(NA). Age of sample, or mean age
sex: Optional, default c(NA). Sex of sample, see mfdb::sex for possible values
maturity_stage: Optional, default c(NA). Maturity stage of sample, see mfdb::maturity_stage for possible values
stomach_state: Optional, default c(NA). Stomach state of sample, see mfdb::stomach_state for possible values
length: Optional, default c(NA). Length of sample
weight: Optional, default c(NA). Weight of sample

The prey_data should be a data.frame with the following columns:

stomach_name: Required. The stomach name of the predator this was found in
species: Optional, default c(NA). Species of sample, see mfdb::species for possible values
digestion_stage: Optional, default c(NA). Stage of digestion of the sample, see mfdb::digestion_stage for possible values
length: Optional, default c(NA). Length of sample / mean length of all samples
weight: Optional, default c(NA). Weight of sample / mean weight of all samples
weight_total: Optional, default c(NA). Total weight of all samples
count: Optional, default c(NA). Number of samples this row represents (i.e. if the data is aggregated), count = NA represents an unknown number of samples

Value

NULL

Examples

mdb <- mfdb(tempfile(fileext = '.duckdb'))
#> 2022-11-16 12:34:33 INFO:mfdb:Creating schema from scratch
#> 2022-11-16 12:34:33 INFO:mfdb:Taxonomy market_category no updates to make
#> 2022-11-16 12:34:33 INFO:mfdb:Schema up-to-date

# We need to set-up vocabularies first
mfdb_import_area(mdb, data.frame(
    id = c(1,2,3),
    name = c('35F1', '35F2', '35F3'),
    size = c(5)))
mfdb_import_vessel_taxonomy(mdb, data.frame(
    name = c('1.RSH', '2.COM'),
    stringsAsFactors = FALSE))
mfdb_import_sampling_type(mdb, data.frame(
    name = c("RES", "LND"),
    description = c("Research", "Landings"),
    stringsAsFactors = FALSE))

data_in <- read.csv(text = '
year,month,areacell,species,age,sex,length
1998,1,35F1,COD,3,M,140
1998,1,35F1,COD,3,M,150
1998,1,35F1,COD,3,F,150
')

data_in$institute <- 'MRI'
data_in$gear <- 'GIL'
data_in$vessel <- '1.RSH'
data_in$sampling_type <- 'RES'
mfdb_import_survey(mdb, data_in, data_source = 'cod-1998')

mfdb_disconnect(mdb)