Skip to contents

Add new records for MS measurement files to msrawfiles index

Usage

addRawfiles(
  rfIndex,
  templateId,
  newPaths,
  newStart = "filename",
  newStation = "same_as_template",
  dirMeasurmentFiles = "/srv/cifs-mounts/g2/G/G2/HRMS/Messdaten/",
  promptBeforeIngest = TRUE,
  saveDirectory = getwd(),
  newStationList = list(station = "example_new_station", loc = list(lat = 1.23456, lon =
    1.23456), river = "example_new_river", gkz = 99999, km = 99999)
)

Arguments

templateId

ES-ID for an existing msrawfiles-record to use as a template

newPaths

Character vector of full paths to new rawfiles which are to be added, must be mzXML files.

newStart

Either "filename" (default), meaning extract the start time from the filename or provide a start date as an 8 digit number "YYYYMMDD"

newStation

Either be "same_as_template" (default), which will copy the value from the template document, or "filename", meaning extract the station and loc values from the code in the filename (will compare to other docs in msrawfiles index) or "newStationList" to add a new station. See details.

dirMeasurmentFiles

Root directory where original measurement files are located. The function will look for original (vendor) files in this directory or below. Must end with "/".

promptBeforeIngest

Should the user be asked to verify the submission? (Default: TRUE). A file called "add-rawfiles-check.json" created in the saveDirectory which the user can check and make changes to. See details "making manual changes".

saveDirectory

Location to save temporary json file for checking and making manual changes. (defaults to current working dir)

newStationList

List with fields "station" and "loc" and optionally "river", "gkz" and "km" to add a new station. see details

rfindex

index name for rawfiles index

Value

Returns vector of new ES-IDs of the generated and imported documents

Details

Adding location information (`station` and `loc` fields) newStation can be either "same_as_template", "filename" or "newStationList".

Using "same_as_template" means the station and location information is copied from the template

Using "filename" means it will use the `dbas_station_regex` field in the index to get the station information from another record in the index. The function will use the regex to extract the station code from the filename and search the msrawfiles index for other records with this station code. If an unambiguous location is found, it will gather all available location information and use this for the new record entry. This option is useful if the batch that is being added contains more than one station.

Using "newStationList" means a new fixed station name and location is passed (use list in the argument "newStationList" with fields "station" and "loc" and optionally "river", "gkz" and "km") "loc" being a geopoint with fields "lat" and "lon". Station and river names should all be lowercase with no spaces and no special characters. For the station name use the convention <river_town_position> where position is l, r or m for left, right, middle. If there is no obvious town but km is known, then use the convention <river>_<km> for the station name. If neither town nor km is known, use the convention <river>_<description> where description is some indication of the location.

Adding sample time information (`start` field)

Using "filename" means that the sample time is extracted from the filename. The field `dbas_date_regex` is used to find the date of the sample from the file name. It works in conjunction with `dbas_date_format`. `dbas_date_regex` extracts the text, while `dbas_date_format` tells R how to interpret the text. For example, the file name `RH_pos_20170101.mzXML` uses the date_regex `"([20]*\d6)"` and date_format `"ymd"`. The `dbas_date_regex` uses the tidyverse regular expression syntax and the `stringr::str_match` function to extract the text referring to the date. The brackets indicate the text to extract and these can be surrounded by anchors. It is also possible to have multiple brackets, the text in multiple brackets will be combined before parsing. For example the file `UEBMS_2024_002_Main_Kahl_Jan_pos_DDA.mzXML` can be parsed with date_regex `_(20\d2)_.*_(\w3)_pos` and date_format `ym`.

`dbas_date_format` may be one of `"ymd"`, `"dmy"`, `"ym"` for year-month and `"yy"` for just the year. The date parsing is done by `lubridate`.

Making manual changes

Manual changes can be made to the json but only if one document is added (this is to minimize errors). Select "c" at the promt (after changes to the json have been saved).

File storage locations

The location of where the mzXML files are stored is given in the argument `newPaths`.`saveDirectory` is where the json file is written to. `dirMeasurmentFiles` is only used to look for the original (non-converted) vendor files (e.g., wiff files) to read the measurement time (which is not copied into the mzXML file using some converters). Depending on the number of files being uploaded, the function can be very slow because the function will look through a large directory. To improve performance, you can tell the function to look in a smaller directory. If no files are found then the function will add the creation time of the mzXML file.

Examples

if (FALSE) { # \dontrun{
library(ntsportal)
connectNtsportal()
rfindex <- "ntsp_msrawfiles"
paths <- list.files("/beegfs/nts/ntsportal/msrawfiles/ulm/schwebstoff/dou_pos/", "^Ulm.*mzXML$", full.names = TRUE)
templateId <- findTemplateId(rfindex, blank = FALSE, pol = "pos", station = "donau_ul_m", matrix = "spm")
addRawfiles(rfindex = rfindex, templateId = templateId, newPaths = paths)
checkMsrawfiles()

# Files in batch have different sample location
addRawfiles(rfindex, "eIRBnYkBcjCrX8D7v4H5", newFiles[4:17], newStation = "filename", 
dirMeasurmentFiles = "~/messdaten/sachsen/") 

# Addition of a new station
addRawfiles(
  rfindex, "eIRBnYkBcjCrX8D7v4H5", newFiles[15], 
  newStation = "newStationList",
  newStationList = list(
    station = "pleisse_9",
    loc = list(lat = 51.251489, lon = 12.383758),
    river = "pleisse",
    km = 9,
    gkz = 5666
  )
) 
} # }