Build a cleaned eDNA sampling dataset from the raw eDNA-for-MPAs directory
Source:R/data_eDNA.R
data_eDNA.RdThis function recursively scans the ../eDNA-for-MPAs/data/ directory to
locate all CSV files containing "GOTeDNA" in their file name or path.
Files are grouped by cruise mission, metadata and data files are paired,
naming inconsistencies are corrected, and a unified table of sample-level
information is produced.
Value
A data frame containing:
- ID
Sample identifier
- date
Sampling date (class
Date)- latitude
Sample latitude
- longitude
Sample longitude (corrected to negative values)
- species_richness
Count of unique species detected in the sample
- method
eDNA sampling method inferred from folder structure
- location
Sampling location identifier
- year
Extracted sampling year
Details
The function performs several formatting corrections, including:
resolving inconsistent
materialSampleIDnaming (e.g.,X,_,-)handling metadata files with different column name conventions
resolving location name inconsistencies (e.g.,
ESI,SAB)extracting sample dates, coordinates, richness, and method type
correcting positive longitudes by flipping sign
The final result is a single cleaned data frame with one row per sample.
The function assumes the following:
The directory
../eDNA-for-MPAs/data/exists and contains cruise folders.Metadata files contain either
materialSampleID,SampleID, oreDNA_Tube.Species columns in data files may be named
Species,species, orV6.Latitude/longitude may be stored as
decimalLatitude/decimalLongitudeorLat/Long.
The function prints progress messages (i, j, k, l) and may enter
browser() in unexpected or inconsistent cases.