Add new records for MS measurement files to msrawfiles index
Source:R/msrawfiles-addRecord.R
addRawfiles.RdAdd new records for MS measurement files to msrawfiles index
Usage
addRawfiles(
rfIndex,
templateId,
newPaths,
newStart = "filename",
newStation = "same_as_template",
dirMeasurmentFiles = "/srv/cifs-mounts/g2/G/G2/HRMS/Messdaten/",
promptBeforeIngest = TRUE,
saveDirectory = getwd(),
newStationList = list(station = "example_new_station", loc = list(lat = 1.23456, lon =
1.23456), river = "example_new_river", gkz = 99999, km = 99999)
)Arguments
- templateId
ES-ID for an existing msrawfiles-record to use as a template
- newPaths
Character vector of full paths to new rawfiles which are to be added, must be mzXML files.
- newStart
Either "filename" (default), meaning extract the start time from the filename or provide a start date as an 8 digit number "YYYYMMDD"
- newStation
Either be "same_as_template" (default), which will copy the value from the template document, or "filename", meaning extract the station and loc values from the code in the filename (will compare to other docs in msrawfiles index) or "newStationList" to add a new station. See details.
- dirMeasurmentFiles
Root directory where original measurement files are located. The function will look for original (vendor) files in this directory or below. Must end with "/".
- promptBeforeIngest
Should the user be asked to verify the submission? (Default: TRUE). A file called "add-rawfiles-check.json" created in the saveDirectory which the user can check and make changes to. See details "making manual changes".
- saveDirectory
Location to save temporary json file for checking and making manual changes. (defaults to current working dir)
- newStationList
List with fields "station" and "loc" and optionally "river", "gkz" and "km" to add a new station. see details
- rfindex
index name for rawfiles index
Details
Adding location information (`station` and `loc` fields) newStation can be either "same_as_template", "filename" or "newStationList".
Using "same_as_template" means the station and location information is copied from the template
Using "filename" means it will use the `dbas_station_regex` field in the index to get the station information from another record in the index. The function will use the regex to extract the station code from the filename and search the msrawfiles index for other records with this station code. If an unambiguous location is found, it will gather all available location information and use this for the new record entry. This option is useful if the batch that is being added contains more than one station.
Using "newStationList" means a new fixed station name and location is passed (use list in the argument "newStationList" with fields "station" and "loc" and optionally "river", "gkz" and "km") "loc" being a geopoint with fields "lat" and "lon". Station and river names should all be lowercase with no spaces and no special characters. For the station name use the convention <river_town_position> where position is l, r or m for left, right, middle. If there is no obvious town but km is known, then use the convention <river>_<km> for the station name. If neither town nor km is known, use the convention <river>_<description> where description is some indication of the location.
Adding sample time information (`start` field)
Using "filename" means that the sample time is extracted from the filename. The field `dbas_date_regex` is used to find the date of the sample from the file name. It works in conjunction with `dbas_date_format`. `dbas_date_regex` extracts the text, while `dbas_date_format` tells R how to interpret the text. For example, the file name `RH_pos_20170101.mzXML` uses the date_regex `"([20]*\d6)"` and date_format `"ymd"`. The `dbas_date_regex` uses the tidyverse regular expression syntax and the `stringr::str_match` function to extract the text referring to the date. The brackets indicate the text to extract and these can be surrounded by anchors. It is also possible to have multiple brackets, the text in multiple brackets will be combined before parsing. For example the file `UEBMS_2024_002_Main_Kahl_Jan_pos_DDA.mzXML` can be parsed with date_regex `_(20\d2)_.*_(\w3)_pos` and date_format `ym`.
`dbas_date_format` may be one of `"ymd"`, `"dmy"`, `"ym"` for year-month and `"yy"` for just the year. The date parsing is done by `lubridate`.
Making manual changes
Manual changes can be made to the json but only if one document is added (this is to minimize errors). Select "c" at the promt (after changes to the json have been saved).
File storage locations
The location of where the mzXML files are stored is given in the argument `newPaths`.`saveDirectory` is where the json file is written to. `dirMeasurmentFiles` is only used to look for the original (non-converted) vendor files (e.g., wiff files) to read the measurement time (which is not copied into the mzXML file using some converters). Depending on the number of files being uploaded, the function can be very slow because the function will look through a large directory. To improve performance, you can tell the function to look in a smaller directory. If no files are found then the function will add the creation time of the mzXML file.
Examples
if (FALSE) { # \dontrun{
library(ntsportal)
connectNtsportal()
rfindex <- "ntsp_msrawfiles"
paths <- list.files("/beegfs/nts/ntsportal/msrawfiles/ulm/schwebstoff/dou_pos/", "^Ulm.*mzXML$", full.names = TRUE)
templateId <- findTemplateId(rfindex, blank = FALSE, pol = "pos", station = "donau_ul_m", matrix = "spm")
addRawfiles(rfindex = rfindex, templateId = templateId, newPaths = paths)
checkMsrawfiles()
# Files in batch have different sample location
addRawfiles(rfindex, "eIRBnYkBcjCrX8D7v4H5", newFiles[4:17], newStation = "filename",
dirMeasurmentFiles = "~/messdaten/sachsen/")
# Addition of a new station
addRawfiles(
rfindex, "eIRBnYkBcjCrX8D7v4H5", newFiles[15],
newStation = "newStationList",
newStationList = list(
station = "pleisse_9",
loc = list(lat = 51.251489, lon = 12.383758),
river = "pleisse",
km = 9,
gkz = 5666
)
)
} # }