This article provides a comprehensive guide to the newly established minimum data standard for wildlife disease research and surveillance.
This article provides a comprehensive guide to the newly established minimum data standard for wildlife disease research and surveillance. Published in 2025, this standard addresses critical gaps in data sharing by defining 40 core data fields and 24 metadata fields to ensure transparency, interoperability, and reusability. Aimed at researchers, scientists, and drug development professionals, the content explores the standard's foundation in FAIR principles, offers step-by-step methodological application, addresses key troubleshooting concerns for sensitive data, and validates its utility through comparative analysis with existing frameworks. By adopting this standard, the scientific community can enhance global health security, improve pandemic preparedness, and facilitate the translation of ecological data into actionable biomedical insights.
In the field of wildlife disease ecology, data fragmentation and inconsistent reporting present significant barriers to understanding disease dynamics and mitigating emerging threats. Most current best practices for data sharing focus predominantly on pathogen genetic sequence data, while neglecting other critical facets of wildlife disease information [1]. This inconsistency is particularly evident in the widespread failure to report negative test results and comprehensive contextual metadata, which are essential for calculating accurate disease prevalence and understanding spatiotemporal patterns [1] [2]. The consequences of this fragmentation extend beyond academic circles, creating vulnerabilities in global health security by impeding early detection and response to zoonotic pathogens with pandemic potential [2] [3].
The lack of standardized formats means that even when data are shared, they often cannot be easily aggregated or compared across studies. A review of national health security capacity found that more than half (57.9%, 62/107) of reporting countries provided no evidence of a functional wildlife health surveillance program, and most countries (83.2%, 89/107) indicated specific gaps in operations, coordination, scope, or capacity [3]. This systematic neglect of wildlife and environmental considerations in health security priorities creates critical voids in our understanding of disease emergence and spread [3].
To address these challenges, researchers have proposed a minimum data and metadata reporting standard specifically designed for wildlife disease studies [1] [2]. This standard was developed through an iterative process incorporating: (i) experience conducting and publishing wildlife disease research; (ii) common practices already followed by scientists in the literature; (iii) best practices for sharing ecological data; and (iv) interoperability with standards used by other platforms such as the Global Biodiversity Information Facility (GBIF) [1].
The guiding philosophy of the data standard is that researchers should share their raw wildlife disease data in a "tidy data" format where each row corresponds to a single measurement - specifically, the outcome of a diagnostic test [1]. This structure accommodates the complex many-to-many relationships between tests, samples, and individual animals that commonly occur in wildlife disease research due to practices like repeated sampling, confirmatory tests, and sample pooling [1].
The data standard is designed for studies involving wild animal samples examined for parasites (including macroparasites, microparasites, and other pathogens), accompanied by information on diagnostic methods, and date and location of sampling [1]. Suitable project types include:
The standard specifically excludes data types better documented elsewhere, such as free-living macroparasite records (better suited to Darwin Core format), arthropod blood meal datasets, and environmental monitoring data not associated with specific animals [1].
Table 1: Required Core Data Fields in the Wildlife Disease Data Standard
| Field Category | Field Name | Description | Requirement Level |
|---|---|---|---|
| Sampling | Sample ID | Unique identifier for the sample | Required |
| Sampling | Sample type | Type of sample collected (e.g., oral swab, blood) | Required |
| Sampling | Collection date | Date when sample was collected | Required |
| Sampling | Location | Geographic location of sampling | Required |
| Host | Host species | Taxonomic identification of host | Required |
| Host | Animal ID | Unique identifier for individual animal | Conditional |
| Parasite | Test result | Outcome of diagnostic test | Required |
| Parasite | Test name | Name of diagnostic test used | Required |
| Parasite | Test target | Pathogen or agent targeted | Required |
The standard identifies 40 core data fields categorized into three groups: 11 related to sampling, 13 related to the host organism, and 16 related to the parasite itself [2]. Of these, 9 fields are designated as required (as detailed in Table 1), while the remainder are optional but recommended to provide sufficient context for interpretation and reuse [1] [4].
Table 2: Required Metadata Fields in the Wildlife Disease Data Standard
| Metadata Category | Field Name | Description |
|---|---|---|
| Project Identification | Project title | Formal title of the research project |
| Project Identification | Project description | Brief summary of project objectives and scope |
| Personnel | Creator name(s) | Names of data creators |
| Personnel | Creator ORCID | Open Researcher and Contributor ID |
| Temporal Coverage | Date range | Start and end dates of data collection |
| Geographical Coverage | Location | Geographical scope of the study |
| Data Identification | Digital Object Identifier (DOI) | Persistent identifier for the dataset |
The standard includes 24 metadata fields (7 required) sufficient to document a dataset according to the DataCite Metadata Schema [1]. These fields ensure proper attribution, contextualization, and discoverability of shared datasets, aligning with FAIR (Findable, Accessible, Interoperable, and Reusable) principles [2].
Protocol Title: Implementation of Minimum Data Standard for Wildlife Disease Research
Purpose: To standardize the collection, formatting, and sharing of wildlife disease data according to the minimum data standard, enhancing interoperability and reuse potential.
Materials:
Procedure:
Fit for Purpose Assessment
Standard Tailoring
Data Formatting
Data Validation
Data Sharing
Table 3: Essential Research Reagents and Solutions for Wildlife Disease Studies
| Reagent/Solution | Function/Application | Implementation Example |
|---|---|---|
| Diagnostic Primers & Probes | Pathogen detection via PCR/rt-PCR | Forward/reverse primer sequences for coronavirus detection [1] |
| Sample Collection Kits | Non-invasive sampling | Oral and rectal swabs for bat coronavirus surveillance [1] |
| Taxonomic Reference Materials | Host species identification | Field guides, genetic barcodes for host species verification [1] |
| Data Validation Tools | Quality control of standardized data | wddsWizard R package for schema validation [1] |
| Template Files | Standardized data formatting | .csv and .xlsx templates from WDDS GitHub repository [1] |
| Geospatial Reference Data | Location standardization | GPS coordinates, gazetteers for spatial precision [1] |
| Ecdysterone 20,22-monoacetonide | Ecdysterone 20,22-monoacetonide, MF:C30H48O7, MW:520.7 g/mol | Chemical Reagent |
| (Rac)-4-Hydroxy Duloxetine-d3 | (Rac)-4-Hydroxy Duloxetine-d3 | (Rac)-4-Hydroxy Duloxetine-d3 is a deuterated metabolite for duloxetine pharmacokinetics and metabolism research. For Research Use Only. Not for human or diagnostic use. |
The practical application of the standard is illustrated by a previously published dataset documenting a novel alphacoronavirus found in bats in Belize [1]. In this example:
The dataset was formatted according to the standard, with all mandatory and relevant fields completed, and cells left blank where fields were not applicable (e.g., parasite identity for negative test results) [1]. The complete standardized dataset is available on the PHAROS platform (project: prjRPayEvMecN) [1].
For optimal reusability, data should be formatted as "tidy data" where each row corresponds to a single measurement (diagnostic test outcome) and each column represents a variable [1]. Researchers should:
The standard includes guidance for navigating potential safety concerns around data sharing, particularly regarding:
Recommended approaches include data obfuscation techniques that maintain scientific utility while preventing misuse, such as generalizing precise coordinates for sensitive species [1] [2].
The adoption of this minimum data standard for wildlife disease research addresses the critical problem of fragmented and inconsistent data that has long impeded ecological understanding and health security preparedness. By providing a practical, flexible framework for data standardization while emphasizing the inclusion of negative results and comprehensive metadata, this approach enables the aggregation and comparison of datasets across studies and geographical regions [1] [2].
Widespread implementation of this standard will enhance the transparency, actionability, and reproducibility of wildlife disease research, ultimately strengthening our collective ability to detect, understand, and mitigate emerging infectious threats at the human-animal-environment interface [2]. As journal policies and funding mandates increasingly require open data, this standard provides a much-needed roadmap to meet those requirements without sacrificing flexibility or usability [2].
The Minimum Data Standard for Wildlife Disease Research and Surveillance represents a pivotal advancement in ecological and public health intelligence. Established through a collaboration of experts from academic and public health institutions and published in 2025, this framework addresses a critical barrier in wildlife disease ecology: fragmented and inconsistent data reporting [1] [2]. The standard is designed to enhance the transparency, reusability, and global utility of data related to pathogens in wild animals, thereby bolstering our collective capacity to detect and respond to emerging infectious threats at the human-animal-environment interface [2].
The guiding philosophy of the data standard is that researchers should share their raw wildlife disease data in a disaggregated format, often referred to as "tidy data," where each row corresponds to a single measurementâspecifically, the outcome of a diagnostic test [1]. This structure acknowledges the complex, many-to-many relationships between tests, samples, and individual animals that are common in wildlife disease studies [1].
The data standard is intentionally designed for flexibility and broad applicability across a diverse range of study designs and surveillance activities within wildlife disease ecology.
The standard is applicable to datasets that describe wild animal samples examined for parasites (including macroparasites, microparasites, viruses, and bacteria) and that include information on the diagnostic methods used and the date and location of sampling [1]. The following project types are explicitly identified as suitable for this standard [1]:
The standard's developers recognize that closely related types of data are better documented using more specialized frameworks. Consequently, the following data types are considered out of scope for this particular standard [1]:
The development of the minimum data standard was driven by several key objectives aligned with modern data science and global health security needs.
The standard is conceived as a minimal yet comprehensive set of fields [2]. It is designed to be accessible to a wide range of practitioners while providing sufficient structure for robust analysis [1]. A key design decision was to use open text fields for most entries rather than a restrictive controlled vocabulary, acknowledging the vast diversity of collection, detection, and measurement methods used in the field. This flexibility is intended to encourage broad community adoption, though the use of existing ontologies is encouraged where appropriate [1].
The standard is composed of a defined set of data and metadata fields designed to document diagnostic outcomes, sampling context, and host characteristics at the finest possible resolution [1] [2].
Table 1: Core Data Field Composition
| Category | Total Fields | Required Fields | Description |
|---|---|---|---|
| Sampling Data | 11 | Not Specified | Information related to the collection and processing of the sample. |
| Host Animal Data | 13 | Not Specified | Data concerning the animal from which the sample was taken (e.g., species, sex, age). |
| Parasite Data | 16 | Not Specified | Details on the diagnostic test, its result, and characterization of any detected parasite. |
| Total Core Data Fields | 40 | 9 | Records are disaggregated to the finest spatial, temporal, and taxonomic scale. |
Table 2: Project Metadata Field Composition
| Category | Total Fields | Required Fields | Description |
|---|---|---|---|
| Project Metadata | 24 | 7 | Information to document the project context, such as creators, funding, and methodology. This aligns with the DataCite Metadata Schema [1]. |
Implementing the minimum data standard in a research project involves a series of methodical steps. The following workflow diagram outlines the key stages from planning to data sharing.
.csv and .xlsx formats are available to facilitate this process [1].wddsWizard, which offers convenience functions for validation [1] [5].To effectively implement and utilize the minimum data standard, researchers can leverage a suite of tools and resources.
Table 3: Essential Research Reagent Solutions
| Tool / Resource | Type | Function | Access |
|---|---|---|---|
| Data & Metadata Templates | Template Files | Pre-formatted .csv and .xlsx files providing the correct structure for data entry. |
GitHub: viralemergence/wdds [1] |
wddsWizard R Package |
Software Package | An R package to restructure datasets, validate them against the standard, and facilitate compliance. | R-universe: viralemergence.r-universe.dev [5] |
| JSON Schema | Validation Schema | A machine-readable schema that defines the standard's structure and rules for formal data validation. | GitHub: viralemergence/wdds [1] |
| PHAROS Database | Data Repository | A dedicated platform for wildlife disease data where standardized datasets can be shared and explored. | pharos.viralemergence.org [1] |
| DataCite Schema | Metadata Standard | The underlying metadata schema used for project-level documentation, promoting interoperability with repositories. | DataCite [1] |
| Mal-VC-PAB-ABAEP-Azonafide | Mal-VC-PAB-ABAEP-Azonafide, MF:C61H71N11O12, MW:1150.3 g/mol | Chemical Reagent | Bench Chemicals |
| Bromo-PEG12-t-butyl ester | Bromo-PEG12-t-butyl ester|PEG Linker|BroadPharm | Bench Chemicals |
The 2025 Minimum Data Standard for wildlife disease research and surveillance provides a much-needed foundation for robust, collaborative, and actionable science. By offering a practical, flexible, and FAIR-aligned framework, it empowers researchers to share their data in a way that maximizes its utility for addressing pressing questions in ecology, conservation, and global health. Widespread adoption of this standard will significantly enhance our ability to understand and mitigate the risks of emerging infectious diseases in a rapidly changing world.
The field of wildlife disease ecology has long been hampered by fragmented and inconsistent data reporting. Most existing best practices for data sharing focus primarily on pathogen genetic sequence data, while other critical facets of wildlife disease dataâparticularly negative resultsâare often withheld or summarized in descriptive tables with limited metadata [1]. This lack of standardization creates significant barriers to data aggregation, synthesis, and re-use, ultimately impeding our ability to track emerging zoonotic threats and understand disease dynamics across ecosystems.
To address these challenges, researchers have developed a minimum data and metadata reporting standard for wildlife disease studies [1] [4] [2]. This standardized framework identifies a set of 40 data fields (9 required) and 24 metadata fields (7 required) sufficient to standardize and document datasets consisting of records disaggregated to the finest possible spatial, temporal, and taxonomic scale [1]. The standard aligns with FAIR principles (Findable, Accessible, Interoperable, and Reusable) and is designed to enhance the transparency, actionability, and global utility of wildlife disease research [2].
The data standard organizes 40 core fields into three logical categories: sample data, host animal data, and parasite data. These fields are designed to capture information at the most granular level possible, typically representing the outcome of a single diagnostic test [1]. The following tables summarize all required and optional fields within each category.
Table 1: Sample-Related Data Fields (11 Fields, 3 Required)
| Field Name | Required/Optional | Description | Controlled Vocabulary Suggested |
|---|---|---|---|
| Sample ID | Required | Unique identifier for the sample | Free text |
| Sample matrix | Required | Type of sample collected | Swab, tissue, blood, feces, etc. |
| Sample preservation method | Optional | How the sample was preserved | RNA later, frozen, ethanol, etc. |
| Date of sample collection | Required | Date when sample was collected | ISO 8601 format (YYYY-MM-DD) |
| Time of sample collection | Optional | Time when sample was collected | ISO 8601 format (HH:MM) |
| Latitude | Optional | Decimal latitude of sampling location | WGS84 |
| Longitude | Optional | Decimal longitude of sampling location | WGS84 |
| Location uncertainty | Optional | Accuracy of location coordinates in meters | Free number |
| Country | Optional | Country of sampling location | ISO 3166-1 alpha-3 |
| Sampling scheme | Optional | Method used for selecting the sample | Targeted, random, convenience, etc. |
Table 2: Host Organism Data Fields (13 Fields, 3 Required)
| Field Name | Required/Optional | Description | Controlled Vocabulary Suggested |
|---|---|---|---|
| Host species | Required | Scientific name of the host species | Binomial nomenclature |
| Animal ID | Optional | Unique identifier for the host individual | Free text |
| Host sex | Optional | Sex of the host individual | Male, Female, Unknown |
| Host age | Optional | Age of the host individual | Free text or numerical with unit |
| Host life stage | Optional | Life stage of the host | Adult, juvenile, subadult, etc. |
| Host reproductive status | Optional | Reproductive condition of host | Pregnant, lactating, etc. |
| Host health status | Optional | Clinical health assessment | Healthy, clinical signs, moribund, etc. |
| Method of host death | Optional | How the host died if applicable | Found dead, euthanized, hunted, etc. |
| Captive or wild | Required | Whether the host was wild or captive | Wild, Captive |
| Host behavior | Optional | Observed behavior of the host | Free text |
Table 3: Parasite/Pathogen Data Fields (16 Fields, 3 Required)
| Field Name | Required/Optional | Description | Controlled Vocabulary Suggested |
|---|---|---|---|
| Test ID | Required | Unique identifier for the diagnostic test | Free text |
| Test result | Required | Outcome of the diagnostic test | Positive, negative, inconclusive, etc. |
| Pathogen taxon tested | Optional | Target pathogen for the test | Free text (ideally taxonomic name) |
| Diagnostic test | Optional | Method used for pathogen detection | PCR, ELISA, culture, microscopy, etc. |
| Test validation status | Optional | Whether the test was validated | In-house, commercial, peer-reviewed, etc. |
| Gene target | Optional | Genetic target for molecular tests | Free text (e.g., RdRp, spike) |
| Primer citation | Optional | Reference for primers/probes used | DOI or citation |
| Ct value | Optional | Cycle threshold for PCR tests | Free number |
| Pathogen taxon identified | Optional | Identity of detected pathogen | Free text (ideally taxonomic name) |
| GenBank accession | Optional | Accession for genetic sequence data | GenBank format |
Beyond the core data fields, the standard includes 24 metadata fields (7 required) that provide essential context about the entire project or study [1]. These fields are crucial for making datasets findable, citable, and interpretable by secondary users.
Table 4: Required Project Metadata Fields
| Field Name | Description | Standard Suggested |
|---|---|---|
| Project title | Name of the research project | Free text |
| Project description | Abstract describing the project's aims and scope | Free text |
| Lead investigator | Person responsible for the project | Free text |
| Lead institution | Organization responsible for the project | Free text |
| ORCID | Unique identifier for the lead investigator | ORCID format |
| Project contact email | Email address for questions about the data | Email format |
| Funding source | Organization that funded the research | Free text |
Additional optional metadata fields include: other investigators, other institutions, other ORCIDs, project start date, project end date, project website, geographic scope, data collector, data contact email, keywords, recommended citation, license, data use agreement, data embargo date, and supplementary notes [1].
The following workflow diagram illustrates the step-by-step process for implementing the wildlife disease data standard in research practice, from study design through data sharing.
The following table details key research reagents and materials commonly used in wildlife disease research, particularly for pathogen detection and characterization.
Table 5: Essential Research Reagents for Wildlife Disease Studies
| Reagent/Material | Function | Application Examples |
|---|---|---|
| RNA/DNA Preservation Buffers | Stabilizes nucleic acids for transport and storage | RNA later for viral RNA preservation in field collections [1] |
| Nucleic Acid Extraction Kits | Isolates DNA/RNA from diverse sample matrices | Extraction of viral RNA from swabs for coronavirus detection [1] |
| PCR Master Mixes | Enzymatic amplification of target nucleic acid sequences | Coronavirus detection using pan-coronavirus PCR assays [1] |
| Specific Primers and Probes | Binds to target sequences for molecular detection | Coronavirus RdRp gene amplification [1] |
| ELISA Kits | Detects pathogen-specific antibodies or antigens | Serological screening for pathogen exposure [1] |
| Viral Transport Media | Maintains pathogen viability during transport | Preservation of viruses from swab samples for culture [1] |
| Field Collection Supplies | Enables safe and standardized sample collection | Sterile swabs, cryovials, personal protective equipment [1] |
| Mal-Phe-C4-Val-Cit-PAB | Mal-Phe-C4-Val-Cit-PAB, MF:C32H40N6O7, MW:620.7 g/mol | Chemical Reagent |
| Boc-NH-PEG1-Ph-O-CH2COOH | Boc-NH-PEG1-Ph-O-CH2COOH, MF:C15H21NO6, MW:311.33 g/mol | Chemical Reagent |
This section provides a detailed methodological protocol for a typical wildlife disease study implementing the minimum data standard, using coronavirus detection in bats as an example [1].
When presenting wildlife disease data, appropriate visualization methods enhance comprehension of patterns and relationships. The table below compares common data visualization approaches relevant to wildlife disease research.
Table 6: Data Visualization Methods for Wildlife Disease Research
| Visualization Type | Best Use Cases | Standard Compliance Application |
|---|---|---|
| Bar Charts | Comparing prevalence across host species or locations | Visualizing differences in detection rates between species [6] |
| Line Charts | Showing disease trends over time | Displaying seasonal patterns in pathogen detection [7] |
| Tables | Presenting exact values for specific data points | Reporting complete standardized datasets with all fields [6] |
| Maps | Visualizing spatial distribution of sampling or detection | Showing geographic patterns of positive tests [1] |
For all visualizations, ensure sufficient color contrast (minimum 4.5:1 for standard text) to meet accessibility standards [8]. Avoid red-green color combinations, which are problematic for color-blind users [9]. Instead, use high-contrast alternatives such as blue with orange or magenta with green [9].
In wildlife disease research, the pursuit of ecological insight has traditionally been dominated by positive findingsâthe detection of pathogens, the confirmation of outbreaks, and the identification of novel parasites. However, this focus creates a substantial blind spot, omitting the critical context provided by negative results and the granular metadata necessary for robust interpretation. The absence of this information severely limits our understanding of disease dynamics and represents a significant source of waste in scientific resources [10].
This Application Note frames these challenges within the urgent context of establishing and implementing a minimum data standard for wildlife disease research. Such standards are vital for transforming fragmented, non-reproducible data into FAIR (Findable, Accessible, Interoperable, and Reusable) resources that can power synthetic analyses, inform public health decisions, and ultimately strengthen our ecological insight [1] [11] [2]. We provide detailed protocols and analytical frameworks to empower researchers, scientists, and drug development professionals to consistently capture and report the full spectrum of data necessary for a comprehensive understanding of disease ecology.
Negative dataâdefined as results that show no detection of a pathogen, statistically insignificant findings, or outcomes that do not support a initial hypothesisâare systematically underrepresented in the scientific literature [10]. An analysis of 110 studies that tested wild bats for coronaviruses revealed that 96 studies (87%) reported data only in summarized format, making disaggregation and reanalysis impossible. Of the 14 studies that did share individual-level data, 11 only shared data for positive results, completely precluding any comparison of prevalence across populations, years, or species [1] [12]. This publication bias creates a distorted view of reality, inflating perceived risks and masking the true absence or limited distribution of pathogens.
Beyond negative results, a critical lack of contextual metadata plagues wildlife disease datasets. Studies frequently fail to report fundamental information such as:
To address these critical gaps, a cross-institutional consortium has proposed a flexible, minimum data and metadata reporting standard specifically for wildlife disease studies [1] [2]. This standard is designed to ensure that shared data are accompanied by sufficient context to be meaningfully reused, while remaining accessible to a broad range of practitioners.
The standard identifies 40 core data fields, categorized into three logical groups, with only 9 designated as required to maintain flexibility [1] [2]. The table below summarizes the key fields in each category.
Table 1: Core Data Fields in the Wildlife Disease Minimum Data Standard
| Category | Key Fields (Required in Bold) | Data Type | Description and Purpose |
|---|---|---|---|
| Sampling & Context | Sample ID |
String | A researcher-generated unique ID for the sample (e.g., "OS_BZ19-114"). Essential for sample tracking [12]. |
Animal ID |
String | A unique ID for the individual animal. Can be blank for pooled samples [12]. | |
Collection Date |
Date | The date of sample collection. Critical for temporal trend analysis [1]. | |
Sampling Method |
String | e.g., "live capture", "passive surveillance". Provides context for potential sampling bias [13]. | |
| Host Organism | Host Identification |
String | The Linnaean classification (ideally species binomial). Equivalent to dwc:scientificName [12]. |
Organism Sex |
String | The sex of the individual animal. Equivalent to dwc:sex [12]. |
|
Host Life Stage |
String | e.g., "juvenile", "adult". Important for understanding age-related susceptibility [12]. | |
Mass, Mass Units |
Number, String | The body mass of the animal at collection, with specified units. A key indicator of host condition [12]. | |
| Parasite/Pathogen | Test Result |
String | The outcome of the diagnostic test (e.g., "positive", "negative", "inconclusive"). The primary record for negative data [1]. |
Pathogen |
String | The identity of the parasite/pathogen tested for. Must be specified even for negative results [1]. | |
Pathogen Taxon ID |
String | A taxonomic identifier (e.g., from NCBI Taxonomy). Enables precise data integration [1]. | |
GenBank Accession |
String | Accession number for associated genetic sequence data. Links to detailed molecular data [1]. |
In addition to the core data, the standard outlines 24 metadata fields (7 required) that describe the project as a whole. These are aligned with the DataCite Metadata Schema and are crucial for discovery and citation [1]. Key metadata includes:
This metadata layer ensures that the dataset is not just a standalone table, but a properly contextualized and citable research output [1] [11].
The following diagram outlines the key stages for planning and executing a wildlife disease study in compliance with the minimum data standard, from formulation to data sharing.
Application: This protocol is designed for active surveillance of pathogens (e.g., viruses, bacteria) in wild animal populations, such as longitudinal studies or outbreak investigations [1].
Materials:
Procedure:
Pre-Field Planning:
Sample Collection:
Animal ID.Collection Date and precise geographic coordinates using a GPS.Host Identification), Sex, Life Stage, and Mass [12].Sample ID that is logically linked to the Animal ID (e.g., Animal BZ19-114 -> Samples OS_BZ19-114, RS_BZ19-114).Sampling Method and any other relevant contextual notes.Diagnostic Testing:
Test Result ("positive", "negative", or "inconclusive").Forward Primer SequenceReverse Primer SequenceGene TargetPrimer Citation [1]Pathogen Taxon ID and GenBank Accession.Data Assembly and Validation:
wddsWizard) to check for formatting errors and compliance with the standard [1].This diagram illustrates the fundamental data structure of "tidy data," where each row represents a single test, and how this structure accommodates both positive and negative results, linking them to critical metadata.
The following table details key reagents and materials essential for conducting standardized wildlife disease research, as referenced in the protocols and studies above.
Table 2: Essential Research Reagents and Materials for Wildlife Disease Studies
| Reagent/Material | Function/Application | Critical Metadata to Record |
|---|---|---|
| Sterile Swabs & Transport Media | Collection and preservation of samples from mucosal surfaces, wounds, or tissues. | Lot number, type of swab (e.g., nylon-flocked), composition of transport medium, storage temperature. |
| PCR Primers & Probes | In vitro detection of specific pathogen genetic material via polymerase chain reaction. | Forward and Reverse Primer sequences, Gene Target, Primer Citation, probe sequence and chemistry (e.g., TaqMan) [1]. |
| ELISA Kits | Immunoassay for detecting pathogen-specific antibodies or antigens in serum or other fluids. | Manufacturer, catalog number, lot number, target antigen/antibody, and citation for the protocol if adapted. |
| RNA/DNA Extraction Kits | Isolation of high-quality nucleic acids from diverse sample matrices (swabs, tissues, blood). | Manufacturer, kit name, version, lot number. Elution volume and concentration of the final extract should also be recorded. |
| Next-Generation Sequencing Library Prep Kits | Preparation of nucleic acid libraries for metagenomic or transcriptomic sequencing on platforms like Illumina. | Kit name and version, lot number, protocol deviations. Links resulting Sequence Read Archive (SRA) accessions are critical [1]. |
| 6',7'-Dihydroxybergamottin acetonide | 6',7'-Dihydroxybergamottin acetonide, MF:C24H28O6, MW:412.5 g/mol | Chemical Reagent |
| Z-LYS-SBZL monohydrochloride | Z-LYS-SBZL monohydrochloride, MF:C21H27ClN2O3S, MW:423.0 g/mol | Chemical Reagent |
The adoption of a minimum data standard that mandates the reporting of negative results and granular metadata is not merely an academic exercise in data management. It is a fundamental prerequisite for generating genuine ecological insight and strengthening our defenses against emerging zoonotic threats. By implementing the protocols and frameworks outlined in this Application Note, the research community can transform isolated datasets into a collaborative, reusable, and powerful resource. This practice ensures that every data pointâwhether positive or negativeâcontributes to a cumulative and accurate understanding of wildlife disease dynamics, ultimately advancing the goals of both conservation and global health security.
The COVID-19 pandemic demonstrated that infectious disease threats anywhere are threats everywhere, revealing critical weaknesses in global early warning systems [14] [15]. Approximately 60% of emerging infectious diseases originate from animals, with wildlife serving as a primary source for novel pathogens [2]. Despite this recognized threat, wildlife disease research has historically been hampered by fragmented, inconsistent data collection and reporting practices that limit the utility of surveillance data for global health security [1] [2]. The recent development of a minimum data standard for wildlife disease research addresses this critical gap by establishing standardized reporting frameworks that enable data aggregation, analysis, and interoperability across studies and jurisdictions [1]. This protocol outlines the application of this data standard within the broader context of global health security frameworks, including the World Health Organization's Pandemic Agreement and the Global Health Security Agenda [16] [14]. By implementing these standardized approaches, researchers can directly contribute to strengthening global capacity for pandemic prevention, preparedness, and response.
The minimum data standard for wildlife disease research provides a comprehensive yet flexible framework for recording and reporting essential information from wildlife disease studies [1]. Developed through an iterative process incorporating real-world data and existing best practices, the standard is designed to be accessible to diverse practitioners while providing sufficient structure for large-scale data analysis [1] [12]. The framework aligns with FAIR principles (Findable, Accessible, Interoperable, and Reusable) and supports global health security objectives by enabling data aggregation and synthesis across studies and surveillance systems [1] [2].
The standard identifies 40 core data fields organized into three logical categories and 24 metadata fields for project-level documentation [1] [12]. This structure ensures that data shared by researchers contains sufficient contextual information for meaningful interpretation and reuse by other scientists, public health agencies, and policymakers [1]. The "tidy data" format, where each row corresponds to a single diagnostic test outcome, facilitates both human interpretation and machine processing [1].
Table 1: Required Data Fields in the Wildlife Disease Data Standard
| Category | Field Name | Data Type | Description | Global Health Security Relevance |
|---|---|---|---|---|
| Sampling | Sample ID | String | Unique identifier for the sample | Enables specimen tracking across laboratories and databases |
| Sampling | Collection Date | Date | Date when sample was collected | Critical for temporal analysis of pathogen emergence and spread |
| Sampling | Location | String | Geographic coordinates of sampling | Allows spatial mapping of disease risks and hotspots |
| Host | Host Identification | String | Linnaean classification of host | Identifies reservoir species and host range for risk assessment |
| Parasite | Pathogen Tested For | String | Target pathogen of diagnostic test | Documents surveillance priorities and testing capabilities |
| Parasite | Diagnostic Test | String | Method used for pathogen detection | Informs test accuracy and comparability across studies |
| Parasite | Test Result | String | Outcome of diagnostic test | Enables prevalence calculations; includes negative data |
| Parasite | Pathogen Identity | String | Identification of detected pathogen | Documents novel pathogen discovery and genetic diversity |
| Parasite | GenBank Accession | String | Reference to genetic sequence data | Links to molecular data for pathogen characterization |
The data standard is designed for flexibility across diverse research and surveillance contexts [1]. Suitable project types include:
The standard specifically excludes certain data types better served by other specialized standards, such as free-living macroparasite records (Darwin Core format), arthropod blood meal data (specialized vector standards), and environmental microbiome data without associated host animals [1].
Objective: To ensure consistent collection of essential data fields during wildlife disease investigations for compatibility with global health security databases.
Materials Required:
Procedure:
Field Sampling and Data Recording
Sample Processing and Storage
Diagnostic Testing
Table 2: Essential Research Reagents and Materials
| Reagent/Material | Specification | Application | Quality Control |
|---|---|---|---|
| Nucleic Acid Extraction Kits | Compatible with sample type (blood, tissue, swab) | Pathogen genetic material isolation | Include extraction controls; validate for sensitivity |
| PCR Primers/Probes | Target conserved pathogen genes | Pathogen detection and identification | Verify specificity; include positive and negative controls |
| ELISA Kits | Validated for wildlife species when possible | Antibody detection; serosurveillance | Assess cross-reactivity; establish species-specific cutoffs |
| Viral Transport Media | Compatible with downstream applications | Preserve viability for virus isolation | Test for inhibition; batch validation |
| Rapid Diagnostic Tests | Field-deployable formats | Preliminary screening in remote areas | Validate against reference standards |
| Microscopy Supplies | Stains, slides, fixatives | Parasite identification and morphology | Standardize examination protocols |
The following workflow diagrams the standardized process for managing and sharing wildlife disease data according to the minimum data standard:
Data Validation and Sharing Steps:
Standardized wildlife disease data directly supports the objectives of major global health security initiatives by enabling early detection of zoonotic threats and facilitating rapid risk assessment [16] [14]. The WHO Pandemic Agreement, adopted in May 2025, emphasizes equitable access to pandemic countermeasures and strengthened global coordination [16]. Implementation of the wildlife disease data standard contributes to these goals through:
The data standard also aligns directly with the U.S. Global Health Security Strategy (2024) and Global Health Security Agenda targets by strengthening core capabilities in surveillance, laboratory systems, and emergency response [14]. Specifically, it addresses the GHSA objective for countries to take greater ownership of health security efforts through standardized, sustainable surveillance systems [14].
Successful implementation of the data standard requires addressing several practical considerations:
The minimum data standard for wildlife disease research represents a practical, implementable framework for aligning ecological surveillance with global health security priorities [1] [2]. By adopting this standard, researchers directly contribute to strengthening the global early warning system for emerging zoonotic threats and support the implementation of international agreements like the WHO Pandemic Accord [16]. The protocol outlined in this document provides a clear pathway for researchers to standardize data collection, management, and sharing practices in ways that enhance both scientific understanding and public health security. As the COVID-19 pandemic demonstrated, technical preparedness must be coupled with data transparency and international solidarity to effectively address global health threats [15]. Widespread adoption of these standardized approaches will help ensure that wildlife disease research fulfills its critical role in pandemic prevention, preparedness, and response.
The establishment of a minimum data standard is a critical advancement in wildlife disease research, a field essential for global health security and ecological stability [2]. The initial and most crucial phase in implementing this standard is determining the fit-for-purpose and defining the project scope. This step ensures that the data collected are not merely abundant but are suitable for their intended useâbe it outbreak investigation, pathogen discovery, or long-term surveillance [1]. A well-defined scope is the foundation for generating data that are findable, accessible, interoperable, and reusable (FAIR), thereby maximizing the scientific and public health impact of the research [1] [2]. This protocol provides a detailed framework for researchers to navigate this essential first step.
The "fit-for-purpose" (FfP) concept, increasingly adopted in regulated scientific research, emphasizes that study design elements must be directly aligned with the primary research objective [17]. In the context of a minimum data standard for wildlife disease research, this means the data collected must be structurally and contextually sufficient to answer the specific research question and be usable by the broader scientific community.
The guiding philosophy of the minimum data standard is that researchers should share raw wildlife disease data in a "tidy" or "rectangular" format, where each row corresponds to a single diagnostic measurement [1]. This structure is vital because research data are often fragmented and inconsistently reported. Many studies only provide summary statistics or share data solely for positive results, which prevents meaningful aggregation and analysis across different studies and hampers the understanding of disease dynamics [1]. Applying an FfP assessment at the project's inception ensures that the resulting dataset will have the granularity and metadata required for both immediate analysis and future reuse.
The minimum data standard is designed for a wide range of project types involving the examination of wild animals for parasites (including viruses, bacteria, and other pathogens) [1]. Before applying the standard, researchers should verify their project's alignment with the following general categories:
It is important to note that related data types, such as records of free-living macroparasites (e.g., from tick dragging) or environmental microbiome data, are better documented using other specialized standards like Darwin Core or MIReAD [1].
Defining the project scope is an actionable process that translates the FfP concept into a concrete research plan. The following steps and workflow provide a methodology for establishing a robust scope.
The following diagram illustrates the logical workflow for defining a project scope that is fit-for-purpose.
The minimum data standard provides a flexible framework of 40 data fields categorized into three groups. The following tables summarize the required and conditional fields essential for ensuring data completeness and interoperability. During the scoping phase, the research team must decide which of these fields will be populated.
This table outlines the core fields required to document the context of the sample and the host organism [1].
| Field Name | Category | Requirement Level | Explanation & Usage |
|---|---|---|---|
| Animal ID | Host | Required | A unique identifier for the individual host animal. |
| Host Species | Host | Required | The scientific name (e.g., Desmodus rotundus) is strongly recommended. |
| Sample ID | Sample | Required | A unique identifier for the biological sample collected. |
| Sample Type | Sample | Required | The type of sample collected (e.g., oral swab, rectal swab, blood, tissue). |
| Sample Date | Sample | Required | The date the sample was collected. |
| Latitude | Sample | Required | The decimal degree latitude of the sampling location. |
| Longitude | Sample | Required | The decimal degree longitude of the sampling location. |
| Host Sex | Host | Conditionally Required | The sex of the host animal, if collected. |
| Host Age Class | Host | Conditionally Required | The age class of the host animal (e.g., adult, juvenile), if collected. |
| Life Stage | Host | Conditionally Required | The life stage of the host animal, if collected and applicable. |
This table outlines the fields required to report the diagnostic methods and results [1].
| Field Name | Category | Requirement Level | Explanation & Usage |
|---|---|---|---|
| Test ID | Parasite | Required | A unique identifier for the specific diagnostic test performed. |
| Test Result | Parasite | Required | The outcome of the diagnostic test (e.g., Positive, Negative, Inconclusive). |
| Diagnostic Method | Parasite | Conditionally Required | The specific method used (e.g., PCR, ELISA, metagenomics). Required if a test was performed. |
| Gene Target | Parasite | Conditionally Required | The specific gene targeted (e.g., RNA-dependent RNA polymerase). Required for PCR-based tests. |
| Parasite Taxon | Parasite | Conditionally Required | The identity of the detected parasite. Required if the test result is positive and identity is known. |
| GenBank Accession | Parasite | Conditionally Required | The accession number for genetic sequence data submitted to a public repository. |
The following toolkit details key reagents and materials commonly used in wildlife disease research, explaining their critical function within the context of the minimum data standard.
| Item | Function in Wildlife Disease Research |
|---|---|
| Swabs (e.g., oral, rectal) | For non-lethal collection of mucosal samples for pathogen detection, crucial for longitudinal studies and minimizing harm [1]. |
| Nucleic Acid Extraction Kits | To isolate DNA/RNA from diverse sample matrices for subsequent molecular assays like PCR and metagenomics. |
| PCR Primers & Master Mixes | Core reagents for targeted molecular detection and identification of pathogens; the primer citation is a key data field [1]. |
| ELISA Kits & Antibodies | For serological detection of pathogen exposure (antibodies) or specific antigens in host samples. |
| Viral Transport Media (VTM) | To preserve the viability and nucleic acid integrity of viruses in swab samples during transport and storage. |
| Liquid Nitrogen Dewar | For cryopreservation of samples in the field, maintaining sample integrity for future analyses. |
| Global Positioning System (GPS) | To record precise latitude and longitude, which are required fields for spatial analysis and mapping [1]. |
| Torosachrysone 8-O-beta-gentiobioside | Torosachrysone 8-O-beta-gentiobioside, MF:C28H36O15, MW:612.6 g/mol |
| 3-Hydroxyoctanoic Acid-d12 | 3-Hydroxyoctanoic Acid-d12, MF:C8H16O3, MW:172.28 g/mol |
A fit-for-purpose scope and adherence to a data standard synergize with other reporting guidelines that promote rigorous and ethical science. For instance, the ARRIVE 2.0 guidelines (Animal Research: Reporting of In Vivo Experiments) provide a checklist to improve the transparency of animal research publications [18]. While the minimum data standard focuses on the structure and content of the underlying dataset, ARRIVE 2.0 ensures the accompanying manuscript adequately describes the experimental design, methods, and results. Using both frameworks in tandem enhances the overall reproducibility, ethical justification, and utility of wildlife disease studies.
The adoption of a minimum data standard is a critical step towards achieving transparency and actionability in wildlife disease research and surveillance [1]. The proposed standard identifies a set of 40 core data fields and 24 metadata fields to document datasets at the finest possible spatial, temporal, and taxonomic scale [1]. This document provides detailed Application Notes and Protocols for Step 2: Tailoring the Standard, guiding researchers in selecting applicable fields and appropriate ontologies for their specific study designs. Proper implementation ensures data is Findable, Accessible, Interoperable, and Reusable (FAIR), maximizing its utility for ecological analysis, disease tracking, and synthesis research.
The minimum data standard comprises 40 core fields categorized into sampling, host organism, and parasite data. Nine of these fields are mandatory for all studies, while the applicability of others depends on the research context and methods [1]. The table below summarizes all core fields, their categories, and their requirement status.
| Field Name | Category | Requirement Level | Description & Applicability |
|---|---|---|---|
| Animal ID | Host | Required | Unique identifier for the individual host animal. |
| Host species | Host | Required | Scientific name (genus, species) of the host animal. |
| Sample ID | Sample | Required | Unique identifier for the specific sample collected. |
| Sample type | Sample | Required | e.g., oral swab, blood, tissue. |
| Diagnostic test name | Parasite | Required | Name of the test used (e.g., PCR, ELISA, culture). |
| Test result | Parasite | Required | Outcome of the diagnostic test (e.g., positive, negative, inconclusive). |
| Test date | Parasite | Required | Date the diagnostic test was performed. |
| Latitude | Sample | Required | Decimal degrees of sample collection location. |
| Longitude | Sample | Required | Decimal degrees of sample collection location. |
| Host age class | Host | Conditional | Applicable if age data is collected. |
| Host sex | Host | Conditional | Applicable if sex is determined. |
| Life stage | Host | Conditional | Applicable if recorded. |
| Forward primer sequence | Parasite | Conditional | Required for studies using PCR. |
| Reverse primer sequence | Parasite | Conditional | Required for studies using PCR. |
| Gene target | Parasite | Conditional | Required for studies using PCR. |
| Primer citation | Parasite | Conditional | Required for studies using PCR. |
| Probe target | Parasite | Conditional | Required for studies using ELISA. |
| Probe type | Parasite | Conditional | Required for studies using ELISA. |
| Probe citation | Parasite | Conditional | Required for studies using ELISA. |
| Parasite species | Parasite | Conditional | Identity of the detected parasite; relevant for positive results. |
The following protocol provides a step-by-step methodology for tailoring the data standard to a specific research project.
Objective: To systematically identify which data fields beyond the mandatory ones are relevant to a study and to select appropriate controlled vocabularies for those fields.
Materials: The data standard template (available in .csv and .xlsx formats from the official GitHub repository: github.com/viralemergence/wdds) [1], the study's experimental design document, and access to the listed ontology resources.
Procedure:
Study Design Audit:
Field Selection Matrix:
Ontology and Vocabulary Alignment:
Data Table Formatting:
Metadata Documentation:
The following diagram illustrates the logical workflow for tailoring the data standard, from initial field identification to final data validation.
The following table details key reagents and materials used in wildlife disease research, with a focus on their function within the context of data generation for this standard.
| Item | Primary Function | Application Notes |
|---|---|---|
| Sterile Swabs | Collection of biological samples from mucosal surfaces, wounds, or fur. | Different types (e.g., oral, rectal, nasal) must be precisely recorded in the "Sample type" field. |
| RNA/DNA Stabilization Buffer | Preserves nucleic acids at ambient temperature for transport from field to lab. | Critical for ensuring the integrity of genetic material for subsequent PCR testing. |
| PCR Master Mix | Contains enzymes, dNTPs, and buffer for the amplification of specific DNA/RNA targets. | Its use necessitates recording "Primer sequences," "Gene target," and "Primer citation" in the data table. |
| Species-Specific Primer/Probe Sets | Oligonucleotides designed to bind to and detect unique genetic sequences of a target parasite. | The core of specific diagnostic tests. The sequences and citations are critical metadata for reproducibility. |
| ELISA Kit | Immunoassay for detecting the presence of antigens or antibodies in a sample. | Using this reagent requires populating fields like "Probe target" and "Probe type" in the standard. |
| Personal Protective Equipment (PPE) | Mitigates zoonotic risk to researchers and prevents cross-contamination between animals/samples [19]. | Includes nitrile gloves, leather gloves for bite risks, and long-sleeved clothing. Safety protocols should be documented. |
| Field Decontamination Supplies | Prevents the spread of pathogens between sampling sites and animals [19]. | Includes bleach, alcohol, and sodium thiosulfate for neutralizing disinfectants. Decontamination methods should be noted. |
| 1,3-Dihydroxy-2,4-diprenylacridone | 1,3-Dihydroxy-2,4-diprenylacridone | Get 1,3-Dihydroxy-2,4-diprenylacridone, a natural acridone alkaloid for cancer and infectious disease research. This product is For Research Use Only. Not for human or veterinary use. |
| (R)-GNA-C(Bz)-phosphoramidite | (R)-GNA-C(Bz)-phosphoramidite, MF:C44H50N5O7P, MW:791.9 g/mol | Chemical Reagent |
To ensure interoperability, the use of controlled vocabularies and ontologies is strongly encouraged for the free-text fields within the standard. The following table provides a mapping of common data fields to recommended semantic resources.
| Data Field Category | Example Fields | Recommended Ontology / Vocabulary | Notes and Access |
|---|---|---|---|
| Host Taxonomy | Host species | Global Biodiversity Information Facility (GBIF) Backbone Taxonomy / NCBI Taxonomy | Provides authoritative and updated scientific names. Use the full binomial name. |
| Location | Country, Location | GeoNames | A global geographical database. Can be used to standardize location names beyond coordinates. |
| Sample Details | Sample type | Environment Ontology (ENVO) / NCBI BioSample | ENVO includes terms for host-associated environmental materials like "oral swab" or "feces." |
| Diagnostic Methods | Diagnostic test name | Ontology for Biomedical Investigations (OBI) | Contains standardized terms for common laboratory processes and assays. |
| Parasite Taxonomy | Parasite species | NCBI Taxonomy | The standard for pathogen naming, especially for viruses and bacteria. |
| Life History Traits | Host sex, Life stage | UBERON Anatomy Ontology / Phenotype And Trait Ontology (PATO) | PATO includes terms for "female," "male," and life stages like "adult" or "juvenile." |
While comprehensive data sharing is a core goal, researchers must navigate potential safety and ethical concerns [1]. Data should be shared at a spatial resolution that does not facilitate the targeting of endangered or threatened species for poaching or persecution. For research involving high-consequence pathogens, a temporary embargo on public data release may be justified to allow for official reporting and public health communication. In all cases, the sharing of precise location data must be balanced with conservation and safety imperatives. All wildlife research must be conducted under approved animal care and use protocols, with appropriate safety measures for zoonotic hazards documented [19].
In the context of wildlife disease research, adopting a consistent data structure is a critical minimum standard that enables robust analysis, collaboration, and reproducibility. Tidy data provides a unified framework for organizing data, ensuring that datasets are "all alike" and therefore easier to manipulate, model, and visualize [20]. The core principles require that every dataset be structured such that each variable forms a column, each observation forms a row, and each value resides in its own cell [20] [21]. Adhering to this format from the outset of data collection minimizes wrangling time and reduces errors in subsequent analytical phases.
pathogen_strain, host_species, collection_date, and viral_load.Field and laboratory data are often recorded in wide formats optimized for data entry, which violates tidy data principles and complicates analysis. The transformation to a tidy format is demonstrated below.
Table 1: Common Messy Field Data Format
| Region | Year | CanineDistemperCount | AvianInfluenzaCount |
|---|---|---|---|
| Northeast | 2022 | 15 | 3 |
| Northeast | 2023 | 22 | 5 |
| Southwest | 2022 | 8 | 12 |
| Southwest | 2023 | 11 | 15 |
Table 2: Tidy Data Format After Restructuring
| Region | Year | Disease | Case_Count |
|---|---|---|---|
| Northeast | 2022 | Canine_Distemper | 15 |
| Northeast | 2022 | Avian_Influenza | 3 |
| Northeast | 2023 | Canine_Distemper | 22 |
| Northeast | 2023 | Avian_Influenza | 5 |
| Southwest | 2022 | Canine_Distemper | 8 |
| Southwest | 2022 | Avian_Influenza | 12 |
| Southwest | 2023 | Canine_Distemper | 11 |
| Southwest | 2023 | Avian_Influenza | 15 |
In the tidy version (Table 2), Disease is a single variable, and Case_Count is another, making it straightforward to filter, group, and visualize data by disease type.
Protocol Title: Conversion of Wide-Format Surveillance Data to Tidy Format Using R and the tidyr Package.
Objective: To standardize a messy dataset where column names represent values of a variable (e.g., different disease names) into a tidy format suitable for statistical analysis.
Materials:
tidyverse meta-package (includes tidyr and dplyr)Methodology:
messy_data.csv) into R using read_csv().Canine_Distemper_Count, Avian_Influenza_Count). These will be pivoted.pivot_longer() function from the tidyr package to reshape the data.
cols.names_to (e.g., "Disease").values_to (e.g., "Case_Count").Disease column (e.g., remove the "_Count" suffix) using mutate() and stringr functions.R Code Example:
The following diagram illustrates the logical process and decision points for achieving and maintaining tidy data within a research project.
Table 3: Essential Software Tools for Tidy Data Management
| Tool Name | Function | Application in Wildlife Disease Research |
|---|---|---|
R tidyverse Metapackage |
A collection of R packages for data science. | Provides a cohesive set of tools (e.g., dplyr, tidyr, ggplot2) for importing, tidying, transforming, and visualizing complex ecological and disease data [20] [22]. |
pivot_longer() Function (tidyr) |
Reshapes data from a wide to a long format. | Critical for fixing tables where different pathogens or measurements are stored as column headers instead of a categorical variable [20]. |
pivot_wider() Function (tidyr) |
Reshapes data from a long to a wide format. | Useful for creating summary tables or formatting data for specific statistical models that require a wide matrix. |
readr Package |
Provides fast and user-friendly functions to read rectangular data (e.g., CSV, TSV). | Ensures reliable and efficient import of large field data sheets into R, correctly preserving data types [22]. |
ggplot2 Package |
A system for creating declarative graphics based on "The Grammar of Graphics." | Directly leverages tidy data structure to create complex, multi-layered visualizations of disease prevalence, spatiotemporal trends, and host-pathogen interactions [20]. |
| ZN(II) Mesoporphyrin IX | ZN(II) Mesoporphyrin IX, MF:C34H36N4O4Zn, MW:630.1 g/mol | Chemical Reagent |
| Ethyl 3,4-Dihydroxybenzoate-13C3 | Ethyl 3,4-Dihydroxybenzoate-13C3, MF:C9H10O4, MW:185.15 g/mol | Chemical Reagent |
Within the framework of a minimum data standard for wildlife disease research, the inclusion of diagnostic-specific fields is not merely an administrative exerciseâit is fundamental to ensuring data interoperability, reproducibility, and reusability [1]. Such standards are vital for creating actionable wildlife health intelligence, which is critical for both ecological health and global pandemic preparedness [2]. A core principle of the minimum data standard is the disaggregation of data to the finest possible spatial, temporal, and taxonomic scale, often in a "tidy" or "rectangular" data format where each row represents a single diagnostic test outcome [1].
While the standard defines a common set of core fields for host, sample, and parasite information, the diagnostic methodology used in a studyâbe it PCR, ELISA, or othersâdetermines a subset of additional, highly specific fields that are essential for a complete record [1]. Providing this granular, method-specific metadata allows future researchers to properly interpret results, assess the assay's validity, and even aggregate data from disparate studies for powerful synthetic analysis. This section provides a detailed guide to identifying, populating, and formatting these diagnostic-specific fields for common assays in wildlife disease research.
The minimum data standard is designed with the flexibility to accommodate a wide range of diagnostic techniques [1]. The table below summarizes the core required fields for any diagnostic record and then details the additional, conditional fields required for specific assay types.
| Field Name | Field Category | Applicability & Description | Data Type |
|---|---|---|---|
| Animal ID | Core (Required) | Unique identifier for the host animal. | Text |
| Sample ID | Core (Required) | Unique identifier for the biological sample. | Text |
| Test ID | Core (Required) | Unique identifier for a specific diagnostic test. | Text |
| Diagnostic method | Core (Required) | The technique used (e.g., "PCR", "ELISA", "Virus Isolation"). | Text |
| Test result | Core (Required) | The outcome of the test (e.g., "positive", "negative", "inconclusive"). | Text |
| Test date | Core (Required) | Date the test was performed (YYYY-MM-DD). | Date |
| Parasite species | Conditional | Identity of the detected parasite; required if test is positive. | Text |
| Forward primer sequence | PCR-specific | Nucleotide sequence of the forward primer. | Text |
| Reverse primer sequence | PCR-specific | Nucleotide sequence of the reverse primer. | Text |
| Gene target | PCR-specific | The specific gene or genomic region targeted (e.g., "RdRp", "N gene"). | Text |
| Primer citation | PCR-specific | Publication or source detailing the primer set. | Text |
| Probe target | ELISA-specific | The specific antigen or antibody targeted by the probe. | Text |
| Probe type | ELISA-specific | The type of probe used (e.g., "antigen", "antibody"). | Text |
| Probe citation | ELISA-specific | Publication or source detailing the probe. | Text |
This structured approach ensures that data from a study using PCR to detect a novel coronavirus in bats [1] and a study using an optimized ELISA for Morganella morganii in livestock [23] can both be formatted with the requisite detail for future reuse, despite their different methodological and taxonomic focuses.
Adherence to detailed experimental protocols is the foundation for generating reliable data that can be standardized. Below are generalized protocols for PCR and ELISA, two cornerstone techniques in pathogen detection.
The following protocol is adapted from procedures used to develop an optimized PCR for direct detection of bacteria in clinical samples, highlighting steps critical for data reporting [23].
1. Sample Preparation and DNA Extraction:
Sample type field.2. PCR Reaction Setup:
Forward primer sequence and Reverse primer sequence fields.Primer citation field should reference the source of these primers.3. PCR Amplification:
4. Analysis of PCR Products:
Test result field and, if positive, the identified Parasite species.This protocol is based on the development of an I-ELISA for serological detection, a common method for large-scale screening of antibody response [23].
1. Antigen Coating:
Probe target (the LPP antigen) must be documented.2. Blocking:
3. Sample and Antibody Incubation:
4. Signal Detection and Analysis:
Test result accordingly.The following diagram illustrates the logical sequence for applying diagnostic-specific fields within the broader context of a wildlife disease study, from sample collection to data reporting.
Data Field Selection Workflow
The successful implementation of the protocols above relies on a suite of essential reagents and materials. The following table catalogs key solutions required for the featured experiments.
| Item | Function/Application | Example from Protocol |
|---|---|---|
| Primers (Oligonucleotides) | Short, single-stranded DNA sequences that bind complementary target DNA to initiate amplification by DNA polymerase in PCR. | Forward and reverse primers targeting a specific gene (e.g., LPP gene for M. morganii) [23]. |
| dNTPs (Deoxynucleotide Triphosphates) | The building blocks (A, T, C, G) used by DNA polymerase to synthesize a new DNA strand during PCR. | Component of the PCR master mix [23]. |
| Taq DNA Polymerase | A thermostable enzyme that synthesizes new DNA strands from dNTPs using primers as a starting point, essential for PCR. | The core enzyme in the PCR reaction [23]. |
| Antigen (e.g., LPP Protein) | A molecule that can be recognized by the immune system; used as a "probe" in ELISA to capture specific antibodies from a sample. | The purified M. morganii lipoprotein (LPP) used to coat ELISA plates [23]. |
| Primary and Secondary Antibodies | The primary antibody binds the antigen (or target antibody in indirect formats); the enzyme-conjugated secondary antibody binds the primary to produce a detectable signal. | Bovine serum antibodies (primary) and HRP-conjugated anti-bovine IgG (secondary) in the I-ELISA [23]. |
| Chromogenic Substrate (e.g., TMB) | A colorless solution that produces a colored, measurable product when cleaved by the enzyme (e.g., HRP) conjugated to the secondary antibody. | TMB (3,3',5,5'-Tetramethylbenzidine) substrate for the ELISA [23]. |
| DNA Extraction Kit | A set of optimized reagents for purifying high-quality genomic DNA from complex biological samples, removing inhibitors. | Used for standard PCR preparation from tissue or swab samples. |
| Blocking Buffer (e.g., BSA, Skim Milk) | A protein-rich solution used to cover non-specific binding sites on the ELISA plate to prevent false-positive signals. | 5% skim milk or 1% BSA used in the blocking step of the ELISA protocol [23]. |
| 8-Aminoguanosine-13C2,15N | 8-Aminoguanosine-13C2,15N, MF:C10H14N6O5, MW:301.23 g/mol | Chemical Reagent |
Precisely navigating and populating diagnostic-specific fields is a critical step in implementing the minimum data standard for wildlife disease research. By meticulously documenting assay parametersâfrom primer sequences to probe targetsâresearchers transform raw data into a FAIR (Findable, Accessible, Interoperable, and Reusable) resource [1] [2]. This practice, demonstrated through the detailed protocols and field mappings for PCR and ELISA, ensures that valuable data on both positive and negative results can be aggregated and analyzed to answer larger-scale ecological and public health questions. Widespread adoption of this standardized approach is foundational to building a robust early warning system for emerging infectious diseases at the human-wildlife interface.
The establishment of a minimum data standard for wildlife disease research creates a foundation for consistent data collection. However, the full value of this standardized data is only realized when it is shared and archived in a manner that makes it Findable, Accessible, Interoperable, and Reusable (FAIR). This protocol provides detailed methodologies for depositing standardized wildlife disease datasets into FAIR-compliant repositories, a critical final step in the data lifecycle that ensures long-term preservation, accessibility, and utility for the global research community. Adhering to this protocol enhances transparency, supports data synthesis for broader ecological insights, and strengthens global health security by making critical wildlife health data actionable [1] [2].
The following diagram illustrates the complete workflow for preparing and sharing a wildlife disease dataset, from initial validation to final repository deposition.
Step 1: Data Validation Prior to Deposition
wddsWizard) available from GitHub (github.com/viralemergence/wddsWizard) to check for completeness and conformity [1].Step 2: FAIR Repository Selection
pharos.viralemergence.org). These support richer, domain-specific metadata [1] [24].Table 1: Criteria for Selecting a FAIR-Compliant Repository
| Criterion | Minimum Requirement | Importance for FAIRness |
|---|---|---|
| Persistent Identifiers | Assigns a Digital Object Identifier (DOI) | Makes data Findable and Citable [24] |
| Metadata Standards | Supports rich, standard-compliant metadata (e.g., DataCite) | Enhances Interoperability and Reusability [24] |
| Clear Licensing | Allows application of open licenses (e.g., CC0, CC-BY) | Defines terms for Reuse [24] |
| Long-Term Preservation | Has documented preservation policy & backup routines | Ensures long-term Accessibility [24] |
| Open Access | Provides free and open access to data | Ensures Accessibility to all researchers [2] |
Step 3: Data and Metadata Packaging
Step 4: Repository Deposition and Publication
The following reagents and materials are essential for conducting the research that generates data compliant with the minimum standard.
Table 2: Essential Research Reagents and Materials for Wildlife Disease Studies
| Item | Function / Application |
|---|---|
| Sterile Swabs (oral, rectal, nasal) | Non-invasive collection of pathogen samples from live wildlife [1]. |
| Primer/Probe Sets (e.g., for coronavirus PCR) | Target-specific oligonucleotides for pathogen detection and identification via molecular methods like PCR [1]. |
| RNA/DNA Extraction Kits | Isolation of high-quality nucleic acids from diverse wildlife sample types (tissue, swab, blood) for downstream diagnostic testing. |
| ELISA Kits (Pathogen-specific) | Serological detection of pathogen exposure or infection through antibody or antigen recognition [1]. |
| Virus Transport Media (VTM) | Preservation of viral viability and nucleic acids during sample transport from the field to the laboratory. |
The table below provides a structured comparison of repository types to guide researchers in making an informed selection decision.
Table 3: Comparison of Data Repository Options for Wildlife Disease Data
| Repository Type | Key Feature | Example Platforms | Ideal Use Case |
|---|---|---|---|
| Domain-Specific | Rich, domain-relevant metadata fields; enhanced interoperability for specialists. | PHAROS, GBIF | Projects aiming for maximum impact and reuse within the wildlife disease ecology community [1] [24]. |
| Generalist | Broad disciplinary acceptance; simple and robust deposition process. | Zenodo, Dryad, OSF | When a dedicated wildlife disease repository is unavailable, or for projects of cross-disciplinary interest [24]. |
wddsWizard R package can provide more user-friendly error messages [1].Systematic archiving of standardized data, including negative results, in FAIR repositories is transformative. It prevents publication bias, enables robust meta-analyses, and provides the foundational data needed to track pathogen dynamics, understand the impacts of climate change, and ultimately improve pandemic early warning systems [1] [2]. This protocol operationalizes the final, crucial step in making wildlife disease research truly actionable for the global community.
The establishment of a minimum data standard for wildlife disease research represents a transformative advancement for ecological understanding and pandemic preparedness. This standard, comprising 40 data fields (9 required) and 24 metadata fields (7 required), enables the aggregation and analysis of disaggregated data at the finest possible spatial, temporal, and taxonomic scales [1]. However, the imperative for comprehensive data sharingâincluding historically underrepresented negative resultsâcreates a critical tension with biosafety and biosecurity concerns [1] [2]. The very location data that provides essential context for disease ecology can simultaneously serve as a roadmap for those who might exploit this information to harm vulnerable species or ecosystems [26].
This Application Note addresses this fundamental challenge by providing practical protocols for sensitive data obfuscation that balance the FAIR (Findable, Accessible, Interoperable, Reusable) principles with necessary safety safeguards [2]. We outline methodologies that allow researchers to share data with sufficient scientific utility while implementing context-aware protections for sensitive information. These guidelines are particularly crucial for research involving endangered species, pathogens with high zoonotic potential, or locations where habitat disturbance represents an immediate conservation threat [26].
Sensitive biological data primarily falls into two interconnected domains with distinct risk profiles and protection requirements [26]:
Nature Conservation and Biodiversity Data: This category encompasses information about endangered species, protection regulations, and temporally sensitive ecological periods such as breeding seasons. Species listed on the National Biodiversity Network Atlas sensitive species list, Biodiversity Action Plans (BAP), or the IUCN Red List of Threatened Species often require careful data management considerations [26].
Biosafety and Biosecurity Data: This includes information about organisms posing direct threats to human, animal, or plant health, including emerging pathogens, genetically modified organisms, and particularly dangerous biological agents [26]. Research involving risk group 3 pathogens such as SARS-CoV, HIV, M.tb, H7N9, and Brucella conducted in Animal Biosafety Level 3 (ABSL-3) facilities exemplifies this category [27].
The decision to obfuscate data must follow a structured risk-benefit analysis. Potential harms from unregulated data sharing include [26]:
Table 1: Risk Classification for Wildlife Disease Data
| Risk Level | Data Characteristics | Potential Harm | Obfuscation Requirement |
|---|---|---|---|
| Low | Common species, low-pathogenicity organisms, broad regional data | Minimal | Standard sharing with complete metadata |
| Moderate | Species of concern, seasonal sensitivities, moderate pathogenicity | Habitat disturbance, research interference | Moderate geographical generalization |
| High | Endangered species, high-consequence pathogens, precise locations | Poaching, population decline, biosecurity breach | Significant obfuscation with controlled access |
| Critical | Critically endangered species, select agents, exact coordinates | Extinction risk, mass mortality, bioterrorism | Restricted access, data use agreements |
Purpose: To reduce risks associated with precise location data while maintaining scientific utility for ecological and epidemiological analysis.
Materials and Equipment:
Procedure:
Validation:
Purpose: To implement a tiered data access system that matches protection levels with specific user needs and credentials.
Materials and Equipment:
Procedure:
Validation:
Purpose: To ensure biosafety and biosecurity when handling and sharing data involving risk group 3 and 4 pathogens or particularly sensitive species.
Materials and Equipment:
Procedure:
Validation:
The following workflow diagram illustrates the logical decision process for implementing appropriate data obfuscation strategies based on dataset characteristics:
Data Obfuscation Decision Workflow
Table 2: Research Reagent Solutions for Data Obfuscation Implementation
| Tool Category | Specific Solution | Function | Implementation Example |
|---|---|---|---|
| Data Repositories | PHAROS (Pathogen Harmonized Observatory) | Specialist platform for wildlife disease data with access controls | Primary repository for standardized wildlife disease data [1] |
| Data Repositories | Zenodo | Generalist repository with DOI assignment and access restrictions | Backup repository with embargo capabilities for sensitive data |
| Data Repositories | GBIF (Global Biodiversity Information Facility) | Biodiversity data infrastructure with sensitive data processing | Publication of generalized occurrence data following Darwin Core [1] |
| Geospatial Tools | GIS Software with Random Offset Algorithms | Coordinate generalization and translation | Implementing spatial obfuscation protocols |
| Geospatial Tools | Administrative Boundary Datasets | Regional context for generalized data | Replacing coordinates with county/municipality names |
| Access Control Systems | Data Use Agreement Templates | Legal frameworks for restricted data sharing | Establishing terms for controlled access data |
| Access Control Systems | User Authentication Platforms | Identity verification for tiered access | Implementing registered and controlled access tiers |
| Reporting Standards | Minimum Data Standard Templates | Standardized formatting for wildlife disease data | Ensuring consistent documentation of obfuscation methods [1] |
The protocols outlined in this Application Note provide a practical framework for implementing the minimum data standard for wildlife disease research while addressing legitimate biosafety and conservation concerns. By integrating these data obfuscation methodologies into research workflows, scientists can contribute to the growing aggregate of wildlife disease data without compromising vulnerable species or ecosystems.
The successful implementation of these guidelines requires ongoing collaboration between disease ecologists, conservation biologists, data scientists, and biosafety professionals. As the field evolves, these protocols should be regularly refined to address emerging challenges and technological advancements. Through careful application of these principles, the wildlife disease research community can uphold its dual commitment to scientific transparency and ecological stewardship, strengthening both global health security and biodiversity conservation.
Wildlife disease research generates complex data relationships that challenge conventional data management practices. Repeated sampling of individuals, the use of pooled testing strategies, and the requirement for confirmatory assays create intricate data structures that must be meticulously documented to ensure scientific rigor and reproducibility [1]. Within the framework of the new minimum data standard for wildlife disease research, researchers now have a standardized approach for handling these complexities while maintaining FAIR (Findable, Accessible, Interoperable, and Reusable) principles [1] [2].
This protocol provides detailed methodologies for implementing the wildlife disease data standard across three common complex scenarios, enabling researchers to maintain data integrity while accommodating real-world research designs. The standard's flexible structure centers on a "tidy data" model where each row corresponds to a single diagnostic test outcome, with appropriate linking fields to connect related observations [1].
The minimum data standard for wildlife disease research establishes 40 core data fields across three categories: sample data, host animal data, and parasite/pathogen data [1]. Of these, nine fields are mandatory for basic compliance, while the remaining fields provide essential context for specific study designs. The standard intentionally uses a "tidy data" structure where each record represents a single observation at the finest possible spatial, temporal, and taxonomic scale [1].
Table 1: Required Data Fields for Complex Study Designs
| Field Category | Field Name | Data Type | Requirement Level | Application to Complex Scenarios |
|---|---|---|---|---|
| Sample Data | Sample ID | String | Required | Critical for all scenarios; must be unique across all databases |
| Sample Data | Animal ID | String | Conditional | Required for repeated sampling; may be blank for pooled tests |
| Sample Data | Sample Date | Date | Required | Essential for temporal analysis in longitudinal studies |
| Sample Data | Sample Type | String | Required | Must specify specimen type (e.g., oral swab, blood, tissue) |
| Host Data | Host Identification | String | Required | Linnaean classification at lowest possible level |
| Host Data | Organism Sex | String | Optional | Recommended for host-level analyses |
| Host Data | Host Life Stage | String | Optional | Important for epidemiological interpretations |
| Parasite Data | Test Result | String | Required | Positive, negative, or indeterminate outcome |
| Parasite Data | Test Name | String | Required | Specific assay name (e.g., "pan-coronavirus PCR") |
| Parasite Data | Pathogen Taxon | String | Conditional | Required for positive results; links to genetic data |
Table 2: Specialized Fields for Complex Testing Scenarios
| Testing Scenario | Specialized Fields | Data Type | Purpose |
|---|---|---|---|
| Molecular Assays (PCR) | Forward Primer Sequence | String | Documents primer used for replication |
| Molecular Assays (PCR) | Reverse Primer Sequence | String | Documents primer used for replication |
| Molecular Assays (PCR) | Gene Target | String | Specific genetic target of assay |
| Serological Assays (ELISA) | Probe Target | String | Antigen or antibody target |
| Serological Assays (ELISA) | Probe Type | String | Type of probe used in assay |
| All Confirmatory Tests | Primer/Probe Citation | String | Reference for published assay protocols |
| Genetic Sequencing | GenBank Accession | String | Links to public genetic database records |
| Pooled Testing | Pool Size | Integer | Number of specimens in pool |
| Pooled Testing | Pool ID | String | Unique identifier for the pool |
Purpose: To document longitudinal studies where the same individual animal is sampled multiple times over a period, enabling analysis of infection dynamics, pathogen persistence, and immune responses.
Methodology:
Data Management Considerations: The same Animal ID should be used across all databases and physical resources, including field notes, laboratory records, and public repositories [1]. This creates a persistent identifier that connects all observations from the same individual.
Repeated Sampling Data Relationships: This workflow demonstrates how a single animal identifier links multiple temporal sampling events and their resulting pathogen data.
Purpose: To efficiently screen populations for low-prevalence pathogens while conserving resources, using statistical methods that account for the group testing approach [29].
Methodology:
Statistical Considerations: Pooled testing efficiency depends on disease prevalence, with optimal pool sizes varying according to expected prevalence rates [29]. The diagnostic sensitivity and specificity of the assay must be accounted for in prevalence estimation, as pooling can affect test performance characteristics.
Pooled Testing Workflow: This diagram illustrates the relationship between individual specimens, their assignment to testing pools, and the retesting process for positive pools.
Purpose: To document comprehensive testing protocols where initial screening tests are followed by confirmatory or supplemental assays to validate results or obtain additional pathogen characterization.
Methodology:
Implementation Example: In a coronavirus surveillance study, an oral swab testing positive by pan-coronavirus PCR might be confirmed through Sanger sequencing of the RdRp gene, with the resulting sequence deposited in GenBank and the accession number recorded in the dataset [1].
Table 3: Example Testing Sequence for Confirmatory Workflow
| Testing Stage | Sample ID | Test Name | Test Result | Pathogen Taxon | GenBank Accession |
|---|---|---|---|---|---|
| Initial Screening | OS_BZ19-114 | Pan-coronavirus PCR | Positive | Not specified | - |
| Confirmatory Test | OS_BZ19-114 | Coronavirus RdRp sequencing | Positive | Alphacoronavirus | OR123456 |
| Supplemental Data | OS_BZ19-114 | Whole genome sequencing | Positive | Alphacoronavirus | OR123457 |
Table 4: Essential Research Reagents for Wildlife Disease Studies
| Reagent Category | Specific Examples | Function/Application | Implementation Notes |
|---|---|---|---|
| Nucleic Acid Extraction Kits | QIAamp Viral RNA Mini Kit, DNeasy Blood & Tissue Kit | Isolation of pathogen genetic material from various sample matrices | Document extraction method in sample processing metadata |
| PCR Master Mixes | OneTaq Quick-Load Master Mix, Luna Universal Probe Master Mix | Amplification of target pathogen sequences | Record primer sequences and gene targets in specialized fields |
| Specific Primers/Probes | Pan-coronavirus primers (e.g., RdRp gene), influenza A matrix protein primers | Target-specific pathogen detection | Include primer citations linking to published assays |
| Serological Assays | ELISA kits for specific pathogens, multiplex immunoassays | Detection of pathogen exposure through antibody response | Document probe target and type in specialized fields |
| Positive Controls | Synthetic RNA controls, quantified pathogen standards | Assay validation and quality control | Essential for establishing test sensitivity and specificity |
| Next-Generation Sequencing Kits | Illumina RNA Prep with Enrichment, Oxford Nanopore kits | Comprehensive pathogen characterization | Link resulting sequences to public databases via accession numbers |
Implement a multi-step validation process to ensure data quality and standard compliance:
Share compliant datasets through appropriate repositories to maximize findability and reuse:
Implementing the minimum data standard for complex wildlife disease research scenarios ensures that valuable data collected through sophisticated study designs remains findable, accessible, interoperable, and reusable. By following these detailed protocols for repeated sampling, pooled testing, and confirmatory assays, researchers can contribute to a growing ecosystem of standardized data that supports synthetic analyses, ecological forecasting, and evidence-based public health decision-making. The provided tools, including template files, validation software, and clear implementation guidelines, lower the barrier to adoption while significantly enhancing research transparency and impact.
The establishment of a minimum data standard for wildlife disease research creates an urgent need for interoperability with established biodiversity data infrastructures. Molecular methodologies now enable documenting organisms from inconspicuous taxa or through non-invasive sampling, generating data that extends beyond traditional ecological observations [31]. This data, comprising sequences with temporal and spatial context, represents valuable occurrence records that serve broader purposes beyond their original molecular ecology or phylogenetic research focus [31].
This protocol details the integration between emerging wildlife disease reporting standards and three foundational frameworks: the Darwin Core (DwC) data standard for biodiversity information, the Global Biodiversity Information Facility (GBIF) network for data publishing and discovery, and the GenBank repository for genetic sequence data. We provide a structured approach for researchers to maximize the impact, reuse, and interoperability of their wildlife disease data through standardized submission pathways.
Wildlife disease datasets exhibit tremendous heterogeneity in scope, granularity, and reporting formats, often omitting critical metadata about sampling effort, location, or host-level information [1]. This variability creates significant barriers to data synthesis, reproducibility, and reuse. Many studies report only summary statistics or positive results, making it impossible to disaggregate data back to the host level for comparative analyses across populations, species, or temporal scales [1].
The recently proposed minimum data standard for wildlife disease research addresses these challenges through 40 core data fields (9 required) and 24 metadata fields (7 required) that capture information at the finest possible spatial, temporal, and taxonomic scale [1]. This standard organizes information into three categories: sample data, host animal data, and parasite data (including diagnostic test results) [1].
Darwin Core provides a stable, well-adopted foundation for sharing biodiversity data through standardized terms and vocabularies [32]. Its flexibility enables handling complex datasets from diverse research and surveillance sources across local to international scales [32]. Ongoing developments, including a new conceptual model and Data Package Guide currently under public review (September - December 2025), promise enhanced capabilities for representing complex data relationships [33].
GBIF serves as the primary global network for publishing and discovering biodiversity data, supporting four dataset classes: metadata-only resources, species checklists, occurrence data, and sampling-event data [34] [35]. The GBIF infrastructure has demonstrated capacity for integrating DNA-derived occurrences [31] and wildlife disease data [32].
GenBank, as part of the International Nucleotide Sequence Database Collaboration (INSDC), represents the foundational repository for genetic sequence data, with daily data exchange between DDBJ, ENA, and GenBank ensuring comprehensive coverage [36] [37]. Most journals require sequences cited in articles to be submitted to INSDC repositories as part of the publication process [36].
The alignment between wildlife disease data standards and Darwin Core enables representation of disease occurrences within broader biodiversity contexts. Table 1 illustrates the mapping between core wildlife disease fields and corresponding Darwin Core terms.
Table 1: Mapping between wildlife disease data standard fields and Darwin Core terms
| Wildlife Disease Field Category | Example Wildlife Disease Fields | Darwin Core Term | Mapping Notes |
|---|---|---|---|
| Host Information | hostScientificName | scientificName | Direct mapping to identified organism |
| hostCommonName | vernacularName | Common name of host species | |
| hostTaxonID | taxonID | Taxon identifier from authority | |
| Temporal Context | collectionDate | eventDate | Direct mapping of sampling date |
| Geospatial Context | decimalLatitude | decimalLatitude | Direct coordinate mapping |
| decimalLongitude | decimalLongitude | Direct coordinate mapping | |
| locationID | locationID | Identifier for sampling location | |
| Sample Context | animalID | organismID | Identifier for specific host individual |
| sampleID | materialSampleID | Identifier for physical sample | |
| samplingProtocol | samplingProtocol | Method used for sample collection |
This mapping enables wildlife disease records to be structured as Darwin Core Occurrences or Sampling Events, with the host organism representing the occurrence and diagnostic results extending the standard through qualified relationships or extensions [32]. The Darwin Core standard has demonstrated flexibility to handle complex wildlife disease datasets while maintaining interoperability with broader biodiversity informatics platforms [32].
The integration of wildlife disease data with existing standards follows a sequential pathway that ensures proper deposition of both genetic sequences and associated occurrence context. Figure 1 illustrates this comprehensive workflow, from initial data collection through to integrated publication.
Figure 1: Integrated workflow for submitting wildlife disease data to GenBank and GBIF, showing the parallel processing of genetic sequences and contextual data that converge in published, linked datasets.
For DNA-derived data, the connection between occurrence records and sequence information creates particularly powerful linkages. A sequence with coordinates and timestamp represents a valuable biodiversity occurrence that transcends its original molecular context [31]. GBIF guidelines specifically address publishing these DNA-derived occurrences, which document taxa identified through molecular methods rather than physical specimens [31].
The connection between platforms is maintained through the GenBank accession number, which should be included in the corresponding Darwin Core record using appropriate terms such as references or associatedSequences [31]. This bidirectional linking enables users to discover genetic sequences through biodiversity portals and find occurrence context through genetic databases.
GenBank provides multiple submission pathways, each optimized for different data types and volumes:
Pre-submission preparation: Assemble sequence data, annotation information, and source metadata including host organism, collection date, and geographic coordinates
Tool selection: Choose appropriate submission tool based on sequence type, volume, and annotation complexity [36]
Data validation: Submit sequences through selected tool, receiving automatic validation feedback
Accession number assignment: GenBank processes submissions within approximately two working days, providing accession numbers for manuscript citation [36]
Confidentiality management: Request delayed public release if pre-publication confidentiality is required, with understanding that appearance in print triggers immediate release [36]
Post-publication linking: Notify GenBank of publication details to connect sequence records with resulting literature [36]
For raw, unassembled reads from next-generation sequencing platforms (e.g., Illumina, PacBio), submission should be directed to the Sequence Read Archive (SRA) rather than GenBank [36].
Before publishing through GBIF, researchers must:
Secure institutional agreements: Alert administrators of plans to publish data through GBIF network, emphasizing increased visibility and impact through traditional academic publications and specimen loans [34]
Request endorsement: Organizations must request endorsement from GBIF community, reviewing data publisher agreement and committing to principle of data sharing [34]
Select publishing tools: Choose between GBIF's Integrated Publishing Toolkit (IPT), national Living Atlases installations, or programmatic API for dataset registration [34]
Dataset class selection: Identify appropriate data class from: metadata-only, checklist, occurrence, or sampling-event data [35]
Darwin Core transformation: Structure data tables using Darwin Core terms as column names, utilizing Excel templates for required and recommended terms [35]
Data validation: Use GBIF Data Validator to check datasets prior to publication, receiving specific recommendations for improving data quality [34]
Licensing selection: Assign one of three Creative Commons licenses: CC0 (no restrictions), CC BY (attribution required), or CC BY-NC (non-commercial with attribution) [34]
IPT publication: Upload data to IPT, map to appropriate core (Taxon, Occurrence, or Event), complete resource metadata, and register dataset with GBIF [35]
The minimum data standard for wildlife disease research should be implemented through a structured five-step process:
Fit-for-purpose assessment: Verify dataset describes wild animal samples examined for parasites with host identification, diagnostic methods, outcomes, parasite identification, and spatiotemporal context [1]
Standard tailoring: Consult field lists to identify applicable fields beyond required elements, appropriate ontologies for free-text fields, and potential need for additional study-specific fields [1]
Data formatting: Utilize template files (.csv or .xlsx format) to structure data according to standard specifications [1]
Data validation: Employ provided JSON Schema or R package (wddsWizard) with convenience functions to validate data and metadata against the standard [1]
Data sharing: Deposit validated data in findable, open-access generalist repository (e.g., Zenodo) and/or specialist platform (e.g., PHAROS - Pathogen Harmonized Observatory) [1]
After independent submission to GenBank and GBIF, researchers should verify the bidirectional linkage between platforms:
Table 2: Essential research reagents and computational tools for standardized wildlife disease data management
| Tool/Resource | Type | Primary Function | Access |
|---|---|---|---|
| GBIF IPT (Integrated Publishing Toolkit) | Software platform | Dataset publication to GBIF network | https://ipt.gbif.org/ |
| BankIt | Web submission tool | GenBank sequence submission for single sequences/small batches | https://www.ncbi.nlm.nih.gov/WebSub/ |
| table2asn | Command-line program | Automated GenBank submission for large batches/annotated genomes | https://www.ncbi.nlm.nih.gov/genbank/table2asn/ |
| GBIF Data Validator | Data quality tool | Pre-publication dataset checking and improvement recommendations | https://www.gbif.org/tools/data-validator |
| WDDS Wizard (R package) | Validation tool | Wildlife disease data standard validation against JSON Schema | github.com/viralemergence/wddsWizard |
| Darwin Core Excel Templates | Data structuring aid | Spreadsheet templates for formatting data to Darwin Core standards | https://www.gbif.org/publishing-data |
| PHAROS Platform | Specialist repository | Wildlife disease data repository with standard implementation | pharos.viralemergence.org |
The integration of emerging wildlife disease data standards with established biodiversity infrastructures represents a critical advancement for the field. By implementing the protocols outlined in this document, researchers can ensure their data achieves maximum impact, interoperability, and reuse across ecological, taxonomic, and public health domains. The parallel submission pathways to GenBank and GBIF, connected through standardized mappings and bidirectional linkages, create a powerful framework for understanding disease dynamics in the context of broader biodiversity patterns.
As standards continue to evolveâincluding ongoing developments in Darwin Core [33] and refinements to wildlife disease reporting [1]âthe foundational integration approaches described here will provide a stable basis for future enhancements. The scientific community's adoption of these standardized protocols will accelerate synthetic research, enable more robust ecological analyses, and ultimately strengthen our capacity to understand and mitigate wildlife disease threats in a changing world.
The emergence of a minimum data standard for wildlife disease research marks a pivotal advancement for ecological health and global health security [2]. This standard, developed through a collaboration of academic and public health institutions, provides a foundational framework for collecting, managing, and sharing wildlife disease data [1] [12]. Its primary objective is to enhance the transparency, reusability, and global utility of data critical for tracking emerging infectious threats and understanding ecosystem health [2]. Adherence to this standard ensures that data collection and sharing practices are not only scientifically rigorous but also align with evolving ethical considerations and legal obligations, thereby fostering responsible research conduct and bolstering pandemic preparedness [1] [2].
The proposed minimum data standard is designed to be both comprehensive and flexible, accommodating diverse study designs and methodologies while ensuring core data elements are consistently reported [1] [12]. It encompasses 40 core data fields, of which 9 are mandatory, and 24 metadata fields, with 7 required to provide essential project-level context [1] [12] [2].
Table 1: Required Core Data Fields (n=9)
| Variable Category | Field Name | Descriptor |
|---|---|---|
| Sampling | Sample ID |
A researcher-generated unique ID for the sample (e.g., "OS BZ19-114") [12]. |
| Host Organism | Host identification |
The Linnaean classification of the animal, ideally to species level (e.g., "Odocoileus virginianus") [12]. |
| Parasite/Pathogen | Diagnostic method |
The technique used to identify the parasite (e.g., "PCR," "ELISA," "culture") [1]. |
Test result |
The outcome of the diagnostic test (e.g., "positive," "negative," "inconclusive") [1]. | |
Parasite identity |
The identity of the detected parasite, reported at the lowest possible taxonomic level [1]. | |
| Spatio-temporal | Date of sample collection |
The date the sample was taken from the host animal [1]. |
Location name |
A researcher-assigned name for the sampling location [1]. | |
Latitude |
The latitude of the sampling location in decimal degrees (WGS84) [1]. | |
Longitude |
The longitude of the sampling location in decimal degrees (WGS84) [1]. |
Table 2: Selected Supplementary Core Data Fields
| Variable Category | Field Name | Descriptor |
|---|---|---|
| Host Organism | Organism sex |
The sex of the individual animal [12]. |
Host life stage |
The life stage of the animal (e.g., "juvenile," "adult") [12]. | |
Mass & Mass units |
The mass of the animal at sampling and corresponding units [12]. | |
| Parasite/Pathogen | Gene target |
The gene targeted for amplification (for PCR tests) [1]. |
Primer citation |
A citation for the primer set used [1]. | |
GenBank accession |
Accession number for pathogen genetic sequence data submitted to GenBank [1]. |
This standard explicitly requires the reporting of negative test results and data disaggregated to the finest possible spatial, temporal, and taxonomic scale, which are often omitted but are critical for robust prevalence estimates and meta-analyses [1] [2]. The standard's structure facilitates the creation of "tidy data," where each row represents a single diagnostic test measurement, ensuring optimal re-use [1] [12].
Ethical compliance in wildlife disease research extends beyond institutional animal care protocols. The collection and sharing of high-resolution data, particularly location information for threatened species, introduces significant ethical responsibilities regarding animal privacy and well-being [2] [38].
True ethical compliance requires challenging structural barriers within research ethics policies that marginalize Indigenous voices and Knowledge systems [39]. Western research paradigms often prioritize quantifiable indicators of animal welfare and the production of knowledge as capital, which can conflict with Indigenous worldviews that emphasize relationality and reciprocal responsibilities to wildlife [39].
Researchers should:
While wildlife disease data itself may not constitute "personal data" in the traditional sense, the infrastructure and principles of data protection laws provide a critical compliance framework, especially when data involves location-based information or is managed by entities operating in regulated jurisdictions.
The World Organisation for Animal Health (WOAH) plays a central role in the global legal and regulatory landscape for animal disease reporting [42]. Its WAHIS (World Animal Health Information System) platform provides a homogeneous tool for members to report listed diseases in both domestic and wild animals, facilitating a near real-time global picture of disease status [42]. Integrating national wildlife disease surveillance data into international reporting mechanisms like WAHIS is a critical step in legal compliance and fostering a One Health approach to managing risks at the human-animal-ecosystem interface [42].
This protocol outlines the steps for formatting, validating, and sharing a wildlife disease dataset according to the minimum data standard, using a hypothetical coronavirus surveillance study in bats as an example.
Host life stage, Mass) and which ontologies to use for free-text fields [1]..csv or .xlsx templates [1].BZ19-114) provides oral and rectal swabs (Sample IDs: OS BZ19-114, RS BZ19-114), which are tested via PCR. This generates two rows in the datasetâone for each test [1].Parasite identity and GenBank accession. For a negative test, these fields are left blank, but the record is still included [1].wddsWizard) to validate the dataset's structure and required fields against the standard [1].The following workflow diagram summarizes this data preparation and sharing process.
Table 3: Essential Resources for Standard-Compliant Wildlife Disease Research
| Tool / Resource | Type | Function in Compliance & Research |
|---|---|---|
| WDDS Templates | Data Template | Pre-formatted .csv/.xlsx files providing the correct structure for data entry, ensuring adherence to the 40-field standard [1]. |
| WDDS JSON Schema / R Package | Validation Tool | Machine-readable schema and R package (wddsWizard) for validating dataset structure and fields against the standard [1]. |
| PHAROS Database | Data Repository | A dedicated platform for archiving and sharing wildlife disease data that aligns with the minimum data standard, enhancing findability and reusability [1]. |
| GBIF / Darwin Core | Data Standard | A biodiversity data standard; interoperability with this and other standards (e.g., MIReAD) is a core feature of the wildlife disease standard [1] [12]. |
| GenBank / SRA | Data Repository | Specialized archives for pathogen genetic sequence data; the standard assumes sequence data is deposited here and linked via the GenBank accession field [1]. |
| WOAH-WAHIS | Reporting System | The global official system for reporting listed animal diseases to international authorities, a key destination for standardized national surveillance data [42]. |
In the field of wildlife disease research, the lack of standardized data reporting has long hindered the ability to aggregate datasets, compare findings across studies, and conduct robust synthetic analyses. This limitation is particularly problematic for understanding emerging zoonotic threats and ecological health, where data fragmentation can obscure critical patterns in pathogen distribution and dynamics. To address this challenge, the community has developed a minimum data standard for wildlife disease research and surveillance [1] [12]. This standard establishes a common framework for reporting key elements of disease studies, enabling improved data sharing, reuse, and aggregation in alignment with FAIR principles (Findable, Accessible, Interoperable, and Reusable) [2].
The theoretical foundation of this standard centers on the use of "tidy data" principles, where each row corresponds to a single diagnostic test outcome, creating a disaggregated record structure that preserves the finest spatial, temporal, and taxonomic resolution [1] [12]. However, adopting any data standard requires practical tools for implementation and validation. This is where the wddsWizard R package and its associated JSON Schema provide critical infrastructure, offering researchers a streamlined pathway to standardize and validate their datasets against the community-defined requirements [30].
The wddsWizard package serves as a bridge between researchers' native datasets and the formal requirements of the Wildlife Disease Data Standard. Developed to support the standardization of wildlife disease data, this package provides a suite of functions that enable researchers to validate their data structures programmatically [30]. The package is openly available through GitHub, reflecting its development as a community resource rather than a proprietary tool.
At its core, the package implements the validation logic through integration with the jsonvalidate package in R, using the AJV (Another JSON Schema Validator) engine to perform rigorous checks against the standard's formal specification [30]. This implementation ensures that validation occurs consistently and reproduces the same results across computing environments, a critical feature for collaborative research projects and data aggregation initiatives.
The Wildlife Disease Data Standard is formally defined through a JSON Schema, which provides a machine-readable specification of the required data structure, fields, and constraints [30] [1]. JSON Schema is a widely adopted standard for validating JSON documents, making it an interoperable choice for defining data structures that may be used across multiple programming languages and platforms.
The schema encapsulates all requirements of the data standard, including:
This schema serves as the single source of truth for what constitutes a valid dataset, ensuring that all tools and platforms implementing the standard do so consistently.
The following diagram illustrates the complete workflow for standardizing and validating wildlife disease data using the wddsWizard package and JSON Schema:
Begin by installing the necessary packages in R. The wddsWizard package is available via GitHub, while its dependency jsonvalidate is available from CRAN:
Before validation, datasets must be structured according to the Wildlife Disease Data Standard template. The standard requires data in a "tidy" format where each row represents a single diagnostic test outcome [1] [12]. Researchers should:
Table 1: Required Data Fields in the Wildlife Disease Data Standard
| Field Name | Data Type | Description | Example |
|---|---|---|---|
| Sample ID | String | Unique identifier for the sample | "OS BZ19-114" |
| Host identification | String | Linnaean classification of host | "Desmodus rotundus" |
| Decimal latitude | Number | Geographic latitude in decimal degrees | 17.2546 |
| Decimal longitude | Number | Geographic longitude in decimal degrees | -88.7698 |
| Event date | String | Date of sample collection | "2019-03-15" |
| Diagnostic test name | String | Name of diagnostic test used | "coronavirus PCR" |
| Test result | String | Outcome of diagnostic test | "positive" |
| Test target | String | Pathogen or marker targeted | "Alphacoronavirus" |
| Parasite taxon | String | Identity of detected parasite | "Alphacoronavirus" |
With formatted data, researchers can proceed with the programmatic validation:
A successful validation generates a confirmation message, indicating the dataset complies with the standard. Failed validation returns detailed error messages that specify:
Researchers should systematically address each identified issue and re-validate until the dataset passes all checks.
Successful implementation of the data standard requires specific computational tools and resources. The following table details key components of the validation toolkit:
Table 2: Essential Research Reagents and Computational Resources
| Resource | Type | Function | Access Point |
|---|---|---|---|
| wddsWizard | R Package | Programmatic validation of data against WDDS | GitHub: viralemergence/wddsWizard |
| WDDS JSON Schema | Data Schema | Machine-readable specification of the standard | Included in wddsWizard package |
| JSON Validator (AJV) | Validation Engine | Core validation engine that checks data structure | Via jsonvalidate R package |
| CSV Templates | Data Template | Pre-formatted tables for data collection | GitHub: viralemergence/wdds |
| Controlled Vocabularies | Terminology | Standardized terms for specific fields | Ontology lookup services |
| PHAROS Database | Data Repository | Specialist platform for sharing validated data | pharos.viralemergence.org |
The validation framework is designed for flexibility across multiple wildlife disease research scenarios. The following diagram illustrates how researchers with different project types can implement the standard:
The standard is particularly valuable for including negative test results and contextual metadata that are often omitted from published studies but are essential for calculating accurate prevalence rates and understanding disease dynamics [1] [2]. By capturing these elements in a standardized format, the framework addresses a critical gap in wildlife disease data representation.
Successful implementation of the validation framework requires attention to several practical considerations. Researchers working with sensitive data related to endangered species or high-consequence pathogens should implement appropriate safeguards, which may include data obfuscation techniques for precise location information [2]. The standard supports these considerations while maintaining scientific utility.
For free-text fields, researchers should utilize controlled vocabularies or ontologies where possible to enhance interoperability. Recommended resources for identifying appropriate terms include the OBO Foundry, the Ontology Lookup Service, and the NCBO BioPortal [30]. This practice maintains flexibility while promoting consistent terminology across datasets.
Once validated, datasets should be shared through appropriate repositories to maximize their utility and impact. Compatible platforms include:
When sharing data, researchers should include comprehensive project-level metadata using the DataCite Metadata Schema as recommended by the Generalist Repository Ecosystem Initiative [1] [12]. This practice ensures proper citation and discoverability of shared datasets.
The wddsWizard R package and JSON Schema validation framework provide an essential toolkit for implementing the minimum data standard in wildlife disease research. By offering a standardized, programmatic approach to data validation, these tools lower the technical barriers to adopting community standards while ensuring rigorous quality control. As adoption grows, these validated datasets will form an increasingly powerful foundation for synthetic analyses, ecological forecasting, and evidence-based decision-making at the interface of wildlife health and global security.
Researchers are encouraged to integrate these validation protocols early in their research workflows, ideally during the data management planning phase, to maximize efficiency and compliance with emerging best practices in reproducible wildlife disease science.
The surveillance of wildlife pathogens, particularly coronaviruses (CoVs) in bats, is a critical component of global One Health initiatives. Bats (Order: Chiroptera) are natural reservoirs for a vast diversity of CoVs, including both Alphacoronavirus and Betacoronavirus genera, and have been implicated in the emergence of several human diseases [43] [44]. Understanding the dynamics of bat-CoV interactions requires not only robust field and laboratory methodologies but also the consistent application of data reporting standards to ensure that findings are Findable, Accessible, Interoperable, and Reusable (FAIR) [12].
This application note details a comprehensive protocol for the detection and characterization of a novel alphacoronavirus in phyllostomid bats from Belize, framed within the context of a proposed minimum data standard for wildlife disease research [12]. We provide a detailed workflowâfrom field sampling and molecular diagnostics to data structuringâdesigned to facilitate data interoperability, reproducibility, and synthesis across studies.
The initial phase of the study involves the strategic collection of samples and associated host data from wild bat populations.
Table 1: Essential Host and Sample Metadata
| Variable | Data Type | Required | Descriptor |
|---|---|---|---|
| Host Identification | String | â | Linnaean classification (e.g., "Carollia sowelli") [12] |
| Sample ID | String | â | Unique identifier for the sample (e.g., "OS BZ19-114") [12] |
| Animal ID | String | Unique identifier for the individual animal [12] | |
| Organism Sex | String | Sex of the animal [12] | |
| Host Life Stage | String | Life stage (e.g., "juvenile", "adult") [12] | |
| Live Capture | Boolean | Whether the animal was alive at capture [12] | |
| Mass | Number | Body mass of the animal [12] | |
| Mass Units | String | Units for mass (e.g., "g") [12] |
The following protocol outlines the steps for detecting alphacoronavirus RNA in collected swab samples.
Table 2: Primer Sequences for Pancoronavirus Nested RT-PCR
| Primer Name | Sequence (5' to 3') | Round |
|---|---|---|
| CHU1F | GGKTGGGAYTAYCCKAARTG |
First |
| CHU1R | TGYTGTSWRCARAAYTCRTG |
First |
| CHU2F | GGTTGGGACTATCCTAAGTGTGA |
Nested |
| CHU2R | CCATCATCAGATAGAATCATCAT |
Nested |
To investigate the host immune response to CoV infection, serum proteomic profiling can be performed.
Adhering to the minimum data standard is crucial for making research data FAIR. The following workflow ensures data is collected, formatted, and documented correctly.
Diagram 1: Data Standardization Workflow
Sample ID and Host identification [12].This section lists key reagents and materials essential for executing the protocols described in this application note.
Table 3: Key Research Reagent Solutions
| Item | Function / Application | Example / Source |
|---|---|---|
| Mist Nets | Passive capture of bats for field sampling. | - |
| Viral Transport Medium (VTM) | Preservation of viral viability and nucleic acids in swab samples during transport. | - |
| Automated Nucleic Acid Extractor | High-throughput, consistent isolation of DNA/RNA from swab samples. | MAGMAX FLEX [43] |
| Pancoronavirus Primers | Detection of a broad range of coronaviruses via RT-nested PCR. | Chu et al. primers targeting RdRp gene [43] |
| One-Step RT-PCR Kit | Combined reverse transcription and PCR amplification for detection of RNA viruses. | SuperScript III One-Step RT-PCR System [43] |
| LC-MS/MS System | High-sensitivity identification and quantification of proteins in serum samples. | - |
| Heat Inactivation Protocol | Safety measure to inactivate potential pathogens in serum prior to proteomic analysis. | 56°C for 30 minutes [45] |
The application of the above protocols in Belize revealed an Alphacoronavirus prevalence of 22.22% to 36.36% across three phyllostomid bat species: Desmodus rotundus (vampire bat), Carollia sowelli (Sowell's short-tailed bat), and Sturnira parvidens (little yellow-shouldered bat) [44]. Phylogenetic analysis of the partial RdRp gene sequences placed the novel viruses within the Alphacoronavirus genus and suggested evolutionary relationships with human CoVs 229E and NL63 [44].
Proteomic analysis, while not detecting viral proteins in serum, identified 32 candidate protein biomarkers of CoV infection in vampire bats. Gene Ontology analysis of these biomarkers indicated that infected bats exhibited downregulation of the complement system and humoral immunity, alongside upregulation of neutrophil-mediated immunity and glutathione processes [45].
The following table demonstrates how the core findings from the Belize study can be structured according to the minimum data standard, ensuring interoperability.
Table 4: Example Data Record Structured per Minimum Standard
| Field | Example Value | Category |
|---|---|---|
| Sample ID | OS_DR_BZ21_001 |
Sample |
| Animal ID | DR_BZ21_001 |
Sample |
| Host Identification | Desmodus rotundus |
Host (Required) |
| Organism Sex | Female |
Host |
| Host Life Stage | Adult |
Host |
| Live Capture | TRUE |
Host |
| Mass | 35.5 |
Host |
| Mass Units | g |
Host |
| Test Result | Positive |
Parasite |
| Target Gene | RdRp |
Parasite |
| Parasite Genus | Alphacoronavirus |
Parasite |
This application note provides a detailed protocol for documenting a novel alphacoronavirus in bats, integrating rigorous experimental methods with a standardized data reporting framework. By adhering to the described workflows and the accompanying minimum data standard, researchers can generate data that is not only scientifically robust but also readily available for future synthesis and analysis. This approach is fundamental for advancing the field of wildlife disease ecology and enhancing our preparedness for zoonotic emergence.
The emergence of a minimum data standard for wildlife disease research addresses a critical deficiency in ecological and public health surveillance: the pervasive reliance on summary-only reporting [1]. This practice, where data are aggregated into descriptive tables, obscures the granular host-level and spatial-temporal details essential for robust analysis [12]. Such summarization has historically constrained the utility of wildlife disease data for meta-analyses, the development of predictive models for emerging zoonoses, and the assessment of global change impacts on disease dynamics [1] [2]. The new standard provides a structured framework for sharing disaggregated data, fundamentally enhancing data reusability, analytical flexibility, and actionable insight generation [1]. This analysis quantitatively and qualitatively compares the capabilities of the minimum data standard against traditional summary-only reporting, demonstrating its transformative potential for the field.
The minimum data standard introduces a comprehensive set of data and metadata fields designed to capture the full context of wildlife disease investigations [1] [2]. The table below summarizes the core quantitative differences in data reporting between the two approaches.
Table 1: Quantitative Comparison of Data Reporting Approaches
| Aspect | Minimum Data Standard | Summary-Only Reporting |
|---|---|---|
| Total Data Fields | 40 fields (9 required) [1] [2] | Typically < 10 fields (e.g., species, location, prevalence) [1] |
| Total Metadata Fields | 24 fields (7 required) [1] | Often minimal or absent |
| Reporting of Negative Results | Mandatory for each test [1] [2] | Often omitted or only positive results shared [1] |
| Data Disaggregation | Record-level (per test/sample), enabling host-level analysis [1] | Aggregated (e.g., prevalence per site/species), limiting analysis [1] |
| Spatial Granularity | Precise coordinates or detailed location descriptors [1] | Often broad regional descriptors [1] |
| Host-Level Detail | Sex, life stage, age, mass, etc. (13 fields) [12] | Rarely included; if so, only as summary statistics [1] |
Adopting the minimum data standard involves a structured process from data collection to sharing. The following protocol outlines the key steps for researchers.
Background: This protocol describes the procedure for formatting a raw wildlife disease dataset according to the minimum data standard, ensuring it is Findable, Accessible, Interoperable, and Reusable (FAIR) [1] [2].
Key Features
Materials and Reagents
wddsWizard) for data validation [1].Procedure
Sample ID, Host identification, Analysis date, Analysis method, Analysis target, Result, Location, Location scale, and Country [1].
c. Critical Step: Include a record for every diagnostic test performed, including those with negative results [1] [2].
d. Use controlled vocabularies (e.g., from Darwin Core) for fields like Host life stage and Organism sex where possible to enhance interoperability [1] [12].Data Analysis The output of this protocol is a validated, "tidy" dataset where each row represents a single diagnostic test outcome [1]. This structure is immediately usable for a wide range of analyses in statistical software (e.g., R, Python) without the need for manual extraction or restructuring.
Validation of Protocol This protocol is validated by its application to real-world datasets, such as a study of coronavirus in Belizean bats, which was successfully formatted and shared on the Pathogen Harmonized Observatory (PHAROS) platform [1].
General Notes and Troubleshooting
Gene target for non-PCR tests) [1].The following diagram illustrates the logical workflow and comparative outcomes of applying the standard versus summary-only reporting.
Implementing the minimum data standard requires both conceptual and practical tools. The following table details key resources and their functions in the standardization process.
Table 2: Essential Reagents and Resources for Standardized Wildlife Disease Research
| Item / Resource | Function / Description | Critical Application in the Standard |
|---|---|---|
| Standard Template Files (.csv/.xlsx) | Pre-formatted files listing all 40 data fields [1]. | Provides the foundational structure for data entry, ensuring consistency across studies. |
| JSON Schema / R Validation Package | Machine-readable rule set and software for checking data compliance [1]. | Automates data quality control by verifying the presence of required fields and correct data types before repository submission. |
| Controlled Vocabularies & Ontologies | Standardized terminology (e.g., Darwin Core, NCBI Taxonomy) [1]. | Ensures interoperability by providing common names for host species, life stages, and other variables, preventing ambiguity. |
| Generalist Data Repository (e.g., Zenodo) | Platform for publishing and preserving research datasets with a persistent DOI [1] [2]. | Makes standardized data Findable and Accessible, fulfilling the FAIR principles and enabling citation. |
| Specialist Platform (e.g., PHAROS) | A database dedicated to wildlife disease data [1]. | Allows for advanced querying, visualization, and aggregation of standardized datasets from multiple studies. |
| Persistent Identifier (e.g., ORCID iD) | A unique identifier for researchers [2]. | Included in metadata to ensure unambiguous attribution for data creators, promoting a culture of data sharing. |
The convergence of biodiversity monitoring and human health surveillance represents a critical frontier in public health and ecological conservation. The emergence of zoonotic diseases underscores the intricate connections between ecosystem integrity and human health outcomes [46]. The Kunming-Montreal Global Biodiversity Framework and the Global Action Plan on Biodiversity and Health provide renewed impetus for developing integrated monitoring systems that can effectively track these complex relationships [46]. Despite this, a significant implementation gap persists, with limited adoption of standardized metrics that bridge these historically separate domains [46].
The absence of harmonized data standards severely hampers the ability to conduct secondary analyses, aggregate datasets across studies, and generate actionable insights for pandemic prevention and ecological health management [1] [2]. This application note addresses this critical gap by presenting a detailed protocol for implementing a minimum data standard specifically designed to enhance interoperability between wildlife disease research platforms and broader biodiversity data infrastructures. By adopting this standardized framework, researchers can significantly improve the findability, accessibility, interoperability, and reusability (FAIR) of wildlife disease data, thereby strengthening early warning systems for emerging health threats at the human-animal-environment interface [2].
The foundation for enhanced interoperability lies in implementing a consistent minimum data standard that captures essential information across sampling, host organisms, and parasite detection. The standard presented here builds on recent scientific consensus regarding the core data elements required for meaningful data integration and reuse in wildlife disease studies [1] [2] [4].
Table 1: Minimum Data Standard for Wildlife Disease Research
| Category | Field Name | Requirement Level | Description | Controlled Vocabulary Recommended |
|---|---|---|---|---|
| Sampling | Sampling Date | Required | Date of sample collection | ISO 8601 format |
| Geographic Coordinates | Required | Latitude and longitude of sampling | Decimal degrees | |
| Diagnostic Method | Required | Test used for parasite detection | Open text | |
| Host Organism | Host Species | Required | Taxonomic identification of host | GBIF Backbone Taxonomy |
| Life Stage | Conditional | Host developmental stage | Open text | |
| Animal ID | Optional | Unique identifier for individual | Open text | |
| Parasite/Pathogen | Test Result | Required | Outcome of diagnostic test | Positive/Negative/Inconclusive |
| Parasite Species | Conditional | Taxonomic identification of parasite | GBIF Backbone Taxonomy | |
| GenBank Accession | Conditional | Identifier for genetic sequence data | GenBank format |
This standardized framework encompasses 40 data fields (9 required) and 24 metadata fields (7 required) that collectively document diagnostic outcomes at the finest possible spatial, temporal, and taxonomic resolution [1] [2]. A critical innovation of this standard is its mandatory inclusion of negative test results, which have historically been underrepresented in wildlife disease data despite their essential value for understanding disease prevalence and distribution [1] [2]. The standard is designed to accommodate diverse diagnostic methodologiesâincluding PCR, ELISA, and pooled testing approachesâwhile maintaining structural consistency for data aggregation and analysis [1].
The standard aligns with and extends existing biodiversity data frameworks, particularly the Darwin Core standard used by the Global Biodiversity Information Facility (GBIF), ensuring compatibility with broader biodiversity data infrastructures [1] [47]. This strategic alignment enables wildlife disease data to contribute to both health surveillance objectives and essential biodiversity variables (EBVs) tracking, effectively bridging the historical divide between public health and ecological monitoring frameworks [48] [46].
This protocol provides a step-by-step framework for implementing the minimum data standard in wildlife disease research and surveillance programs. The workflow ensures consistent data collection, formatting, and sharing practices that enhance interoperability between biodiversity and health data platforms.
Table 2: Research Reagent Solutions and Essential Materials
| Item | Function | Implementation Context |
|---|---|---|
| Template Files (.csv, .xlsx) | Standardized structure for data recording | Pre-formatted templates ensure consistent implementation across studies [1] |
| JSON Schema Validator | Automated validation of data structure | Checks compliance with standard before data publication [1] |
| GBIF Backbone Taxonomy | Taxonomic normalization service | Ensures consistent species identification across datasets [1] [47] |
| PHAROS Database | Specialized repository for wildlife disease data | Platform for sharing standardized datasets with the community [1] |
| Generalist Repository (Zenodo) | FAIR-compliant data archive | Provides persistent identifiers and long-term preservation [1] [2] |
Project Evaluation and Planning
Data Collection and Standardization
Data Validation and Quality Control
Data Publication and Integration
The implementation of wildlife disease data standards achieves maximum impact when strategically aligned with existing biodiversity monitoring infrastructures and policy frameworks. Europe's evolving biodiversity monitoring landscape, characterized by the development of Thematic Hubs and the future European Biodiversity Observation Coordination Centre (EBOCC), offers a strategic pathway for this integration [49]. These expert-driven platforms serve as coordination mechanisms for specific biodiversity domains, facilitating structured dialogue and methodological alignment across monitoring communities [49].
The Biodiversa+ partnership has identified wildlife diseases as one of twelve priority areas for enhanced monitoring capacity, recognizing their significance for both ecosystem health and human health security [48]. This prioritization creates a strategic entry point for integrating standardized disease surveillance data into broader biodiversity observation networks. Furthermore, initiatives such as the OBIS-GBIF Joint Strategy for Marine Biodiversity Data (2025â2030) demonstrate practical frameworks for making biodiversity data more interoperable, accessible, and actionable for science and decision-making [47].
For effective policy integration, standardized wildlife disease data should be incorporated into National Biodiversity Strategies and Action Plans (NBSAPs) as countries work to update these documents in alignment with the Kunming-Montreal Global Biodiversity Framework [46]. This integration ensures that wildlife health monitoring becomes an institutionalized component of national biodiversity assessments rather than remaining a separate public health activity. The recently proposed specifications for cross-scale inclusion of harmonized biodiversity monitoring protocols provide practical guidance for achieving this integration through common minimum requirements for monitoring objectives, variables, sampling units, and reporting formats [49].
The implementation of standardized data practices for wildlife disease research generates significant scientific and policy benefits across multiple domains. The enhanced interoperability enables more robust secondary analyses and ecological synthesis research, supporting the investigation of macroecological patterns of pathogen distribution and the impacts of global change on disease dynamics [1] [46].
From a public health perspective, standardized data dramatically improves early warning systems for emerging zoonotic threats. The COVID-19 pandemic has underscored the urgent need for transparent, high-quality wildlife surveillance data that can be rapidly aggregated and analyzed to assess spillover risk [2]. By mandating consistent documentation of sampling context, host characteristics, and detection methods, the standard enables more accurate risk assessments and targeted surveillance interventions [1] [2].
For conservation and ecosystem management, integrating disease surveillance with biodiversity monitoring provides crucial insights into wildlife population health and the impacts of diseases on species of conservation concern [48] [46]. This integrated perspective is essential for implementing the One Health approach, which recognizes the interconnected health of humans, animals, plants, and ecosystems [46]. The standardized data facilitates the development of integrated science-based metrics that can quantify the environmental burden of disease and track progress toward achieving both biodiversity conservation and public health objectives [46].
Adoption barriers may include technical capacity limitations, concerns about data sensitivity, and institutional resistance to changing established practices. These challenges can be addressed through targeted training programs, clear guidance on ethical data sharing, and demonstration of the tangible benefits achieved through earlier adopters [47] [2]. The growing requirements from funding agencies and scientific journals for FAIR data practices provide additional impetus for researchers to adopt these standardized approaches [2].
The Minimum Data Standard for wildlife disease research is specifically designed to transform disparate, non-comparable datasets into a harmonized and interoperable resource. Its structure directly addresses the major bottlenecks in synthesis research by ensuring data is shared at the finest spatial, temporal, and taxonomic scale, and that critical contextual metadata is consistently reported [1]. This enables two primary forms of synthesis: meta-analyses, which quantitatively combine results from multiple studies, and predictive modeling, which uses aggregated data to forecast disease dynamics and inform management decisions.
The standard's requirement to include negative data (non-detections) is particularly crucial. Most historical studies only report positive detections or provide summarized prevalence data, making it impossible to recalculate true prevalence or compare across studies for a meta-analysis [1]. By providing disaggregated data, the standard allows for the recalculation of effect sizes and the investigation of heterogeneity, which are the cornerstones of a robust meta-analysis [50].
For modeling, the key parameters required to initialize and parameterize frameworksâsuch as pathogenicity, host breadth, transmission pathways, and spatiotemporal locationâare explicitly captured in the standard's fields [51]. This provides modelers with the foundational data needed to build predictive models early in an outbreak, even amidst uncertainty.
This protocol outlines the steps for utilizing datasets adhering to the Minimum Data Standard to conduct a systematic review and meta-analysis of wildlife disease effects.
Host SpeciesLocation (Decimal Latitude and Longitude)Test Result and Test DateDiagnostic MethodSex, Age Class) where available [1]Table 1: Data Structure for Meta-Analysis of Pathogen Prevalence
| Study ID | Host Species | Positive Samples | Total Samples | Prevalence | Effect Size (SMD) | 95% CI | Weight (%) |
|---|---|---|---|---|---|---|---|
| Smith et al. 2020 | Myotis lucifugus | 15 | 100 | 0.15 | 0.45 | (0.21, 0.69) | 15.2 |
| Jones et al. 2021 | Eptesicus fuscus | 8 | 50 | 0.16 | 0.48 | (0.15, 0.81) | 10.1 |
| Lee et al. 2022 | Myotis lucifugus | 22 | 110 | 0.20 | 0.60 | (0.38, 0.82) | 16.5 |
| Overall Effect (Pooled) | 0.51 | (0.38, 0.64) | 100 |
Host Species, Age Class, Diagnostic Method) explain the heterogeneity in results.
This protocol describes how to use data formatted according to the Minimum Data Standard to parameterize and initialize predictive models for emerging wildlife diseases.
The Minimum Data Standard provides direct inputs for the five key characteristics needed for predictive modeling of disease systems [51].
Host Health Outcome and population-level effects.Sample Type and Diagnostic Method.Host Species entries.Host Species ecology and Location data.Location and associated environmental data.Table 2: Mapping Data Standard Fields to Key Modeling Parameters
| Key Model Characteristic | Relevant Data Standard Fields | Model Parameter Example |
|---|---|---|
| Pathogenicity | Host Health Outcome, Parasite Load |
Disease-induced mortality rate; recovery rate |
| Transmission Pathways | Sample Type, Diagnostic Method |
Transmission rate (β); transmission matrix |
| Taxonomic Host Breadth | Host Species, Host Taxonomy |
Number of susceptible host species; reservoir competence |
| Host Social/Movement Behavior | Host Species, Location |
Contact rate; diffusion coefficient |
| Environmental Niche | Location, Test Date |
Environmental suitability layer; seasonality factor |
Table 3: Essential Resources for Implementing the Workflow
| Tool / Resource | Function / Description | Access Information |
|---|---|---|
| WDDS Wizard (R package) | Convenience functions to validate a dataset and its metadata against the JSON Schema implementing the data standard. | GitHub: github.com/viralemergence/wddsWizard [1] |
| Template Files (.csv/.xlsx) | Pre-formatted files containing the required and optional data fields, ensuring correct structure from the start of a project. | GitHub: github.com/viralemergence/wdds [1] |
| PHAROS Database | A dedicated platform for wildlife disease data, accepting submissions formatted according to the Minimum Data Standard. | Web: pharos.viralemergence.org [1] [2] |
| GBIF (Global Biodiversity Information Facility) | A global data infrastructure that allows sharing of and access to biodiversity data, compatible with the Darwin Core standard which aligns with this data standard. | Web: gbif.org [1] |
| WebAIM Color Contrast Checker | An online tool to verify that color contrasts in visualizations (e.g., model output diagrams) meet WCAG accessibility guidelines (AA level: 4.5:1). | Web: webaim.org/resources/contrastchecker/ [52] |
The minimum data standard for wildlife disease research establishes a unified framework for reporting data on pathogen detection in wild animals. Its core strength lies in a flexible, technology-agnostic structure that captures essential information regardless of the diagnostic method used, ensuring interoperability and long-term relevance [1] [2]. The standard comprises 40 core data fields (9 required) and 24 metadata fields (7 required), designed to document findings at the most granular spatial, temporal, and taxonomic levels possible [1]. This architecture allows researchers to maintain data consistency and reusability even as diagnostic technologies evolve, supporting the core FAIR principles (Findable, Accessible, Interoperable, and Reusable) that are critical for synthetic research and global health security [1] [2].
The standard's foundation is a "tidy data" format, where each row represents a single diagnostic measurement [1]. This simple rectangular structure (.csv) is inherently adaptable, capable of accommodating complex many-to-many relationships between samples, hosts, and tests that arise from advanced methodologies like repeated sampling, sample pooling, or confirmatory testing [1].
The table below summarizes the standard's core data fields, demonstrating how its organization supports diverse data types.
Table 1: Core Data Fields of the Wildlife Disease Data Standard
| Category | Field Example | Field Type | Required | Description |
|---|---|---|---|---|
| Sample Data | Sample ID | String | â | Unique identifier for the sample [12]. |
| Animal ID | String | Unique identifier for the individual host [12]. | ||
| Host Data | Host Identification | String | â | Linnaean classification (e.g., Odocoileus virginianus) [12]. |
| Organism Sex | String | Sex of the host individual [12]. | ||
| Host Life Stage | String | Life stage (e.g., juvenile, adult) [12]. | ||
| Mass / Length | Number | Morphological data with relevant units [12]. | ||
| Parasite/Pathogen Data | Diagnostic Test | String | â | Name of the diagnostic test (e.g., PCR, ELISA) [1]. |
| Test Result | String | â | Outcome of the test (e.g., positive, negative, Ct value) [1]. | |
| Pathogen Identity | String | Taxonomy of the detected parasite/pathogen [1]. | ||
| Gene Target / Probe Target | String | Method-specific field (e.g., for PCR or ELISA) [1]. |
The standard achieves flexibility through several key features:
Diagnostic Test and Test Result are open-text, capturing essential outcomes from any current or future technology [1].Forward primer sequence and Gene target, while ELISA tests use fields like Probe target and Probe type [1].This protocol guides researchers in integrating data from emerging diagnostic platforms into the standard, ensuring consistency and reusability.
Objective: To completely and accurately report data from a novel diagnostic technology using the minimum data standard.
Pre-requisites:
Step-by-Step Procedure:
Determine Applicability
Map Data to Core Fields
Define Technology-Specific Parameters
Diagnostic Test: Record the full, specific name of the new technology (e.g., "CRISPR-based lateral flow assay").Test Result: Define the result format (e.g., "positive/negative", "nanopore read count", "digital PCR count"). The standard accommodates diverse quantitative and qualitative results [1].Test Citation: Provide a citation (publication or manufacturer's protocol) detailing the principle and procedure of the novel method.Capture Novel Metadata
Validate and Share Data
The following diagram illustrates the standard's core-periphery architecture, which allows it to remain stable while integrating new technologies.
Figure 1: The core-periphery architecture of the data standard enables its stability and extensibility. The central core of required fields remains constant, while technology-specific modules can be added or updated as diagnostics evolve.
The workflow for applying the standard, from study design to data sharing, is outlined below.
Figure 2: The five-step implementation workflow guides researchers from data collection to sharing, ensuring compliance with the standard.
The table below lists essential materials and resources for implementing the data standard in wildlife disease research.
Table 2: Essential Research Reagent Solutions and Resources
| Item | Function/Description | Example/Reference |
|---|---|---|
| Template Files | Pre-formatted .csv or .xlsx files ensuring correct data structure. | Available on GitHub: github.com/viralemergence/wdds [1]. |
| JSON Schema | Machine-readable schema to validate dataset structure and formatting. | Provided with the standard for automated validation [1]. |
R Package wddsWizard |
Convenience functions in R to validate data and metadata. | Available on GitHub: github.com/viralemergence/wddsWizard [1]. |
| Controlled Vocabularies & Ontologies | Standardized terms (e.g., for species, units) to enhance interoperability. | Encouraged use of existing ontologies; supporting information provides guidance [1]. |
| Data Repositories | Open-access platforms for sharing standardized data to ensure findability and reusability. | Generalist (e.g., Zenodo) or specialist (e.g., PHAROS, GBIF) repositories [1] [2]. |
The implementation of a minimum data standard for wildlife disease research marks a pivotal advancement for both ecological science and biomedical progress. By standardizing the collection and sharing of disaggregated dataâincluding critically underreported negative resultsâthis framework transforms fragmented findings into a cohesive, globally interoperable knowledge base. For drug development professionals and researchers, this enhances the ability to identify zoonotic threats early, understand pathogen ecology, and trace disease origins. Widespread adoption will strengthen the foundational infrastructure for pandemic prediction and prevention, ultimately supporting a more proactive and collaborative One Health approach to safeguarding human, animal, and environmental health. The future of wildlife disease research depends on data that is not just available, but truly actionable.