Long-Read Nanopore Sequencing for Parasite Genotyping: Advances, Applications, and Best Practices

Grayson Bailey Dec 02, 2025 535

This article provides a comprehensive overview of Oxford Nanopore Technologies (ONT) long-read sequencing for parasite genotyping, tailored for researchers and drug development professionals.

Long-Read Nanopore Sequencing for Parasite Genotyping: Advances, Applications, and Best Practices

Abstract

This article provides a comprehensive overview of Oxford Nanopore Technologies (ONT) long-read sequencing for parasite genotyping, tailored for researchers and drug development professionals. We explore the foundational principles enabling real-time, field-deployable sequencing of pathogen genomes, such as Plasmodium and Schistosoma. The scope covers diverse methodological applications, from whole-genome sequencing and targeted amplicon panels for drug resistance profiling to microbial community analysis. We detail essential troubleshooting for common challenges like homopolymer errors and sample contamination, and present rigorous validation data comparing Nanopore performance to Illumina sequencing. The synthesis aims to guide the effective implementation of this transformative technology in genomic surveillance and clinical diagnostics.

Unlocking Parasite Genomes: Core Principles and Field-Deployable Power of Nanopore Sequencing

This application note details the principles and protocols of Oxford Nanopore Technologies (ONT) sequencing, tracing the journey from the measurement of raw electrical signals to the generation of interpreted nucleotide sequences (basecalls). Framed within the context of parasite genotyping research, we elucidate how the technology's capacity for long reads, real-time analysis, and direct detection of base modifications provides powerful tools for overcoming challenges in characterizing complex parasitic genomes. The document provides a foundational understanding for researchers, scientists, and drug development professionals aiming to implement nanopore sequencing in their studies of parasitic diseases.

Nanopore sequencing is a third-generation sequencing technology that enables the direct, real-time analysis of long DNA or RNA fragments by measuring changes in an ionic current as nucleic acids pass through a protein nanopore [1]. Unlike second-generation sequencing, it does not require fragmentation, amplification, or fluorescent labeling, which allows for the preservation of base modifications and the sequencing of reads spanning tens to hundreds of kilobases [1] [2]. This capability is particularly advantageous for parasite genotyping research, where long reads are invaluable for assembling complex, repetitive genomes, resolving multi-gene families involved in immune evasion, and conducting haplotyping to distinguish recrudescence from new infections in clinical trials [3] [2].

The Core Technology: From Molecules to Signals

The fundamental process of nanopore sequencing can be broken down into several key stages, which transform a single molecule of DNA or RNA into a digital nucleotide sequence.

The Sequencing Process

The Nanopore and Flow Cell: Sequencing occurs on a flow cell containing an electrically resistant membrane embedded with biological nanopores. These are protein channels engineered to allow the passage of single-stranded nucleic acids [1].
The Motor Protein and Controlled Translocation: During library preparation, sequencing adapters are ligated to the DNA or RNA molecules. These adapters are pre-loaded with a motor protein, an enzyme that binds to the nanopore and controls the unwinding and ratcheting of the nucleic acid strand through the pore at a steady speed [1].
Signal Generation and the "Squiggle": The flow cell membrane is held under a constant voltage, creating an ionic current that flows through each nanopore. As a DNA or RNA strand passes through a nanopore, each nucleotide or k-mer (a short sequence of bases) causes a characteristic disruption in this current. MinKNOW, the instrument control software, records these disruptions, producing a raw electrical signal known as a "squiggle" [4] [5] [1]. This squiggle is the primary data output of the sequencing experiment.

Diagram 1: The core workflow of nanopore sequencing, from sample to sequence.

Basecalling: Decoding Squiggles into Bases

Basecalling is the computational process that interprets the raw squiggle data to determine the sequence of nucleotides. ONT's basecallers use sophisticated machine learning models to perform this translation.

Basecalling Algorithms and Neural Networks

ONT's production basecaller, Dorado, employs algorithms based on neural networks, specifically bi-directional Recurrent Neural Networks (RNNs) and transformer models [4] [5]. These computational networks are modeled loosely on the human brain, with layers of nodes that process information.

Training: The neural network models are trained on vast datasets where the raw electrical signals are matched to known reference sequences ("ground truth") [5]. This training enables the model to learn the complex relationship between a specific squiggle pattern and its corresponding DNA or RNA k-mer.
Inference: During sequencing, the trained model takes the incoming squiggle from MinKNOW and, leveraging its memory of past and future signal context, predicts the most likely sequence of bases [4] [5].

Basecalling Models and Accuracy

Dorado offers different basecalling models that balance speed and accuracy, allowing users to choose based on their experimental needs. The key metrics for these models are summarized below [5].

Table 1: Comparison of Oxford Nanopore Basecalling Models

Model	Description	Relative Speed	Typical Use Case
Fast	Designed for real-time analysis, can keep up with data generation on most devices.	Highest	Live basecalling during sequencing runs.
High Accuracy (HAC)	Provides higher raw read accuracy than the Fast model; more computationally intensive.	Medium	A balance of accuracy and speed for most applications.
Super Accurate (SUP)	Highest raw read accuracy; the most computationally intensive model.	Lowest	Post-sequencing basecalling for maximum accuracy.

Basecalling can be performed in real-time during the sequencing run ("live basecalling") or after the run is complete ("post-run basecalling") [5] [1]. The output of the basecaller is typically stored in standard file formats such as FASTQ (for base sequences and quality scores) or BAM (which can also include alignment information and modified base calls) [4] [5].

Application in Parasite Genotyping: A Protocol for Distinguishing Malaria Recrudescence

A critical application of nanopore sequencing in parasitology is in Therapeutic Efficacy Studies (TES) for antimalarial drugs, where it is essential to distinguish between a recrudescence (treatment failure) and a new infection. The following protocol, adapted from a recent study, demonstrates a rapid, multiplexed nanopore amplicon sequencing (AmpSeq) approach for this purpose [3].

Experimental Workflow and Reagent Toolkit

The entire process, from sample to answer, can be completed in a short timeframe, leveraging the portability and speed of nanopore sequencing.

Diagram 2: Workflow for multiplexed amplicon sequencing to genotype P. falciparum.

Table 2: Research Reagent Solutions for Parasite Genotyping

Item	Function	Example from Protocol
Multiplex PCR Panel	Simultaneously amplifies multiple target loci from limited DNA, enabling high-throughput genotyping.	Panel of 6 polymorphic microhaplotype loci (ama1, celtos, cpmp, csp etc.) [3].
Native Barcoding Kit	Allows for the pooling of multiple samples in a single sequencing run by tagging each with a unique barcode sequence.	Native Barcoding Kit 96 V14 (SQK-NBD114.96) [3].
Flow Cell	The consumable containing nanopores for sequencing. Pore version influences data quality.	R10.4.1 flow cell, which has a dual reader improving accuracy in homopolymer regions [4] [3].
Basecalling Software	The algorithm that converts raw electrical signals into nucleotide sequences. Model choice affects accuracy.	Dorado basecaller (v0.8.2) with the super-accurate (sup) model [3].

Key Experimental and Analytical Methodologies

Sensitive Detection of Minority Clones: The protocol was validated using mixtures of different P. falciparum laboratory strains (e.g., 3D7, K1, HB3) at defined ratios. The assay demonstrated high sensitivity, reliably detecting a minority clone present at a ratio of 1:100:100:100, which is crucial for identifying minor parasite populations in polyclonal infections [3].
Bioinformatic Analysis: A custom bioinformatics pipeline is used to infer haplotypes from the basecalled reads. The process involves stringent filtering—basecalling with Dorado using a minimum Q-score of 20 (≥99% accuracy) to minimize errors—followed by rigorous cutoff criteria to distinguish true haplotypes from false positives caused by sequencing errors [3].
Distinguishing Recrudescence: In the paired patient samples (Day 0 and day of recurrence), a recrudescence is identified if one or more haplotypes are found in both samples. A new infection is concluded if all haplotypes in the recurrent sample are different from those in the Day 0 sample. The nanopore AmpSeq assay consistently distinguished recrudescence from new infections in 17 out of 20 cases (85%) [3].

Performance Metrics

The performance of the nanopore genotyping assay, as reported in the cited study, is summarized below [3].

Table 3: Performance Metrics of the Nanopore Amplicon Sequencing Assay

Performance Metric	Result	Implication for Parasite Genotyping
Sensitivity	Detection of minority clones at 1:100:100:100 ratio.	High capability to detect low-abundance clones in polyclonal infections.
Specificity	False-positive haplotypes < 0.01%.	High confidence in called haplotypes, reducing false conclusions.
Reproducibility	Intra-assay: 98%; Inter-assay: 97%.	Robust and consistent results across technical and experimental replicates.
Genetic Diversity	High across markers (e.g., cpmp He=0.99, 28 haplotypes).	High power to discriminate between genetically distinct parasite strains.

The journey from squiggles to basecalls encapsulates a powerful and versatile sequencing paradigm. For researchers in parasitology and drug development, nanopore technology offers a unique combination of long reads, real-time data access, portability, and direct modification detection. As demonstrated in the protocol for malaria genotyping, these features enable rapid, accurate, and detailed genetic analysis that is directly applicable to critical public health challenges, such as monitoring antimalarial drug efficacy. The continuous improvements in basecalling accuracy and sequencing chemistry promise to further solidify the role of this technology in advancing parasite genotyping research.

Parasitic diseases caused by organisms like Plasmodium falciparum and Schistosoma mansoni remain a major global health challenge, with malaria alone causing an estimated 263 million cases and 597,000 deaths annually [6]. The complex, repetitive genomes of these parasites have historically complicated genetic studies aimed at understanding drug resistance, transmission dynamics, and virulence mechanisms. Short-read sequencing technologies, while highly accurate for single nucleotide variants, consistently fail to resolve structural variants (SVs), repetitive regions, and complex gene families that are prevalent in parasite genomes [2]. Long-read sequencing technologies, particularly from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), are now overcoming these limitations by generating reads spanning tens to hundreds of kilobases, enabling complete genome assemblies and comprehensive variant detection [2] [7]. This application note details how long-read sequencing provides critical advantages for parasite genotyping research, with specific protocols and data to guide researchers and drug development professionals.

Advantages of Long-Read Sequencing for Parasite Genomics

Long-read sequencing technologies offer several transformative capabilities for parasite research. They enable complete characterization of structural variants (>50 bp) including deletions, duplications, insertions, inversions, and translocations that significantly impact phenotype [8]. A recent study on Schistosoma mansoni populations identified 17,446 SVs representing 6.5% of the genome, with 168 population-specific SVs at-or-near fixation that impact coding sequences [8].

These technologies also resolve complex resistance mechanisms by covering entire gene regions rather than predefined hotspots. This is particularly valuable for tracking antimalarial resistance, where mechanisms extend beyond Pfk13 mutations to include emerging resistance genes like Pfcoronin, Pfubp1, and Pfap2μ [6]. Additionally, long-read sequencing enables real-time, portable genomic surveillance in resource-limited settings through portable devices like the MinION, providing rapid turnaround from sample to result [2] [3].

The following tables summarize key performance metrics and structural variant data from recent studies applying long-read sequencing to parasite genomics.

Table 1: Performance Metrics of Long-Read Sequencing in Parasite Studies

Application	Sensitivity	Specificity/False Positive Rate	Coverage/Uniformity	Cost per Sample
P. falciparum Drug Resistance Surveillance [6]	50 parasites/μL (DBS), 5 parasites/μL (venous blood)	Species-specific with undetectable cross-reactivity	100% target coverage at thresholds; >89% uniformity for VB	$15.60
P. falciparum Recrudescence Detection [3]	Minority clones detected at 1:100:100:100 ratios	False-positive haplotypes < 0.01%	Uniform coverage across 6 microhaplotype markers	Not specified
S. mansoni Structural Variant Characterization [8]	Identification of low-frequency variants challenging	Precise breakpoint mapping for 17,446 SVs	6.5% of genome covered by SVs	Not specified

Table 2: Structural Variant Distribution in Schistosoma mansoni Populations [8]

Variant Type	Count	Percentage of Total	Genomic Features
Deletions	8,525	48.9%	Enriched in repeat regions
Insertions	8,410	48.2%	Enriched in repeat regions
Inversions	311	1.8%	Impact regulatory regions
Duplications	131	0.8%	Often involve gene copies
Translocations	69	0.4%	Affect chromosomal architecture
Population Distribution
Shared (≥4 populations)	10,293	59%	Conserved across populations
Population-specific	2,093	12%	Potential local adaptation

Experimental Protocols

Protocol 1: Comprehensive Molecular Surveillance of Antimalarial Resistance

This protocol enables full-gene sequencing of known and emerging antimalarial resistance markers in P. falciparum [6].

Research Reagent Solutions

Table 3: Essential Materials for Antimalarial Resistance Surveillance

Item	Function	Specifications/Alternatives
QIAamp DNA Mini Kit	Genomic DNA extraction from blood samples	Suitable for low parasitemia samples
UCP Multiplex PCR Kit	Amplification of multiple targets simultaneously	Ensures balanced amplification of 6 targets
VAHTS Universal Pro DNA Library Prep Kit	Illumina library preparation	Compatible with long amplicons
Custom Primer Panel	Targets resistance markers	Covers Pfk13, Pfcoronin, Pfap2μ, Pfubp1, Pfmdr1, Pfcrt
Illumina NovaSeq 6000	High-throughput sequencing	2×150 bp chemistry recommended

Step-by-Step Procedure

Primer Panel Design:
- Select six target genes: Pfk13, Pfcoronin, Pfap2μ, Pfubp1, Pfmdr1, and Pfcrt.
- Standardize amplicon sizes to 2.5 ± 0.2 kb using Multiply software to minimize amplification bias.
- Design for full-length coverage of Pfk13, Pfcoronin, and Pfap2μ, with partial sequences covering all known resistance-associated loci for other genes.
DNA Extraction:
- Extract gDNA from dried blood spots (DBS) or venous blood using QIAamp DNA Mini Kit.
- Elute in nuclease-free water and quantify using fluorometric methods.
Multiplex PCR Optimization:
- Use 4 μL of gDNA template in 20 μL multiplex PCR reactions.
- Iteratively optimize primer concentrations and annealing temperatures through gel electrophoresis and sequencing validation.
- Aim for detection thresholds of ≤5 parasites/μL in both DBS and venous blood samples.
Library Preparation and Sequencing:
- Clean multiplex PCR products using 0.6× ratio of QIAseq Beads.
- Prepare libraries using VAHTS Universal Pro DNA Library Prep Kit.
- Sequence on Illumina NovaSeq 6000 platform with 2×150 bp chemistry.
- Allocate 0.25GB sequencing data for DBS samples (>50 p/μL) and 0.5GB for venous blood samples (>5 p/μL).
Bioinformatic Analysis:
- Quality control and filter raw reads using fastp.
- Remove reads mapping to human reference genome.
- Align reads to P. falciparum reference and call variants across all target genes.

Protocol 2: Distinguishing Recrudescence from New Infection in Antimalarial Trials

This protocol uses nanopore sequencing of microhaplotypes to differentiate treatment failure from new infections in therapeutic efficacy studies [3].

Research Reagent Solutions

Table 4: Essential Materials for Recrudescence Detection

Item	Function	Specifications/Alternatives
MinION Mk1C	Portable sequencing device	Enables real-time analysis in field settings
Native Barcoding Kit 96 V14	Sample multiplexing	Allows processing of 96 samples simultaneously
R10.4.1 Flow Cells	Nanopore sequencing	Provides high accuracy reads
Custom 6-plex PCR Panel	Amplification of microhaplotypes	Targets ama1, celtos, cpmp, cpp, csp, surfin1.1

Step-by-Step Procedure

Sample Collection and Preparation:
- Collect paired blood samples before treatment (day 0) and at recurrence.
- Extract gDNA ensuring concentration suitable for amplification (≥10 parasites/μL).
Multiplex PCR:
- Amplify six polymorphic microhaplotype loci (ama1, celtos, cpmp, cpp, csp, and surfin1.1) using optimized primer concentrations.
- Use previously published primer sequences with modifications for uniform amplification.
Library Preparation:
- Prepare libraries using Native Barcoding Kit 96 V14 according to manufacturer's instructions with modifications for parasite DNA.
- Load onto MinION Mk1C with R10.4.1 flow cells.
Sequencing and Basecalling:
- Sequence using MinKNOW software (v24.06.15 or later).
- Perform simplex basecalling and double-ended demultiplexing with Dorado (v0.8.2) using the super-accurate (sup) model.
- Set minimum Q-score for passing reads to 20 (accuracy ≥99%).
- Target approximately 25,000 reads per marker per sample (150,000 reads total).
Haplotype Inference:
- Use custom bioinformatics workflow to infer haplotypes from polyclonal infections.
- Apply rigorous cutoff criteria for accurate haplotype calling.
- Compare pre- and post-treatment haplotypes to distinguish recrudescence (identical haplotypes) from new infections (different haplotypes).

Workflow and Application Diagrams

Diagram Title: End-to-End Parasite Genotyping Workflow

Diagram Title: Research Applications and Impact of Long-Read Sequencing

Long-read sequencing technologies have revolutionized parasite genomics by overcoming the limitations of short-read approaches for complex, repetitive genomes. The protocols and data presented demonstrate how researchers can leverage these technologies for comprehensive antimalarial resistance surveillance, accurate distinction of recrudescence from new infections, and population-level structural variant analysis. As these technologies continue to evolve with improvements in accuracy, portability, and cost-effectiveness, they will play an increasingly critical role in accelerating drug development, informing treatment strategies, and supporting global efforts to control and eliminate parasitic diseases. The integration of long-read sequencing into routine parasite surveillance represents a transformative advance with the potential to significantly impact public health outcomes in endemic regions.

Long-read nanopore sequencing has revolutionized parasite genotyping research by enabling real-time, portable genomic analysis. The compact, USB-powered MinION device from Oxford Nanopore Technologies (ONT) has been pivotal in shifting molecular surveillance from centralized laboratories to field settings [9]. This transition is particularly critical for tracking parasitic diseases like malaria, where rapid identification of drug-resistant strains or distinction between recrudescence and new infection directly impacts treatment efficacy and public health responses [3]. The technology's capacity for real-time data analysis and direct RNA/DNA sequencing without amplification bypasses the logistical and temporal constraints of conventional sequencing, offering researchers and drug development professionals unprecedented flexibility in study design and implementation [9] [10].

Table: Key Characteristics of Portable Nanopore Sequencing for Parasite Genotyping

Feature	Specification/Advantage	Application in Parasite Research
Device Portability	Palm-sized (stapler dimensions); USB-powered [9]	Deployment in remote field sites for malaria surveillance [11]
Data Delivery	Real-time data streaming; no fixed run time [9]	Adaptive sampling; stop sequencing once sufficient data is obtained [3]
Read Length	Short to ultra-long reads (record: >4 Mb) [9]	Phasing of complex parasite genomes; spanning repetitive regions [3]
Library Preparation	As fast as 10 minutes with rapid kits [9]	Rapid turnaround from sample to answer for time-sensitive studies [3]
Workflow Simplicity	Automated prep available (VolTRAX); minimal pipetting [9] [11]	Accessible for non-specialist users in low-resource settings [11]

Technical Specifications and Portable Sequencing Solutions

The MinION platform's technical design is inherently suited for decentralization. Unlike large, fixed-installation sequencers, MinION is a pocket-sized, USB-powered device that facilitates "analysis to the sample" [9]. This form factor has been proven in extreme environments, from the Antarctic to the International Space Station [9]. For large-scale projects, the GridION and PromethION systems offer scalable throughput while maintaining the core advantages of nanopore sequencing [9]. The Flongle adapter provides an ultra-low-cost flow cell option for smaller, routine tests, making it ideal for targeted parasite genotyping assays where cost-per-sample is a critical factor [9].

The core technology involves passing a strand of DNA or RNA through a protein nanopore embedded in an electro-resistant membrane. Each nucleotide base disrupts the electrical current in a characteristic way, producing a unique "squiggle" that is decoded into sequence data in real-time [10]. This direct electronic analysis allows for the sequencing of native DNA and RNA, thereby preserving base modifications and eliminating amplification bias—a crucial feature for accurate genotyping and epigenetic studies in parasites [9] [10].

Application Note: Rapid Multiplexed Amplicon Sequencing forPlasmodium falciparum

Background and Objective

Therapeutic efficacy studies (TES) for antimalarial drugs require molecular correction to distinguish between treatment failure (recrudescence) and new infections. A recent study demonstrated the use of a multiplexed nanopore Amplicon Sequencing (AmpSeq) assay to provide rapid, corrected drug efficacy estimates directly in field-relevant settings [3]. The objective was to develop a robust, portable genotyping method capable of detecting minority clones in polyclonal infections with high sensitivity and specificity, overcoming the limitations of traditional capillary electrophoresis [3].

Experimental Protocol

Sample Preparation and DNA Extraction

Sample Type: Use paired patient whole blood samples (Day 0 and day of recurrence) or dried blood spots (DBS) [3] [11].
DNA Extraction: Perform extraction using a commercial kit (e.g., QIAamp UCP Pathogen Mini Kit). For DBS, optimize protocols for low parasitemia [11] [12].

Multiplex PCR Amplification

Targets: A 6-plex PCR panel targeting highly polymorphic microhaplotype loci (ama1, celtos, cpmp, cpp, csp, and surfin1.1) [3].
Primer Design: Use previously published outer primers [3]. The panel was optimized for uniform amplification efficiency.
Reaction Setup:
- Primers: 200 nM for each inner primer pair.
- Master Mix: Use a robust hot-start ready mix (e.g., KAPA2G Robust HotStart ReadyMix).
- Cycling Conditions: Initial denaturation at 95°C for 3 min; 30 cycles of 95°C for 15 s, 55°C for 15 s, and 72°C for 30 s [3] [12].

Library Preparation and Sequencing

Kit: Native Barcoding Kit 96 (SQK-NBD114.96) with modified protocol [3].
Device: MinION Mk1C with R10.4.1 flow cell [3].
Software: MinKNOW for sequencing run control [3].
Sequencing Depth: Target approximately 25,000 reads per marker per sample. Sequencing is stopped once the desired depth is achieved, leveraging real-time analysis [3].

Data Analysis

Basecalling & Demultiplexing: Perform simplex basecalling and double-ended demultiplexing using dorado (v0.8.2) with the super-accurate model and a minimum Q-score of 20 (≥99% accuracy) [3].
Haplotype Inference: Use a custom bioinformatics workflow to infer haplotypes from polyclonal infections, applying rigorous cutoff criteria for accurate haplotype calling and minority clone detection [3].

Key Findings and Performance Metrics

The nanopore AmpSeq assay demonstrated high performance in both laboratory validation and analysis of clinical samples, confirming its suitability for rapid parasite genotyping [3].

Table: Performance Metrics of the Nanopore AmpSeq Assay [3]

Performance Metric	Result	Significance
Sensitivity	Detection of minority clones at a ratio of 1:100:100:100	High sensitivity for detecting minor clones in polyclonal infections.
Specificity	False-positive haplotypes < 0.01%	High confidence in haplotype calls.
Reproducibility	Intra-assay: 98%; Inter-assay: 97%	Robust and consistent performance across runs.
Marker Diversity	Highest for cpmp (HE=0.99; 28 unique haplotypes)	High power to discriminate between strains.
Concordance in Paired Samples	17/20 cases (85%) successfully classified	Reliable distinction between recrudescence and new infection.

The Scientist's Toolkit: Essential Research Reagent Solutions

Successful implementation of portable nanopore sequencing for parasite genotyping relies on a defined set of reagents and tools. The following table details the essential components for establishing this workflow.

Table: Essential Research Reagent Solutions for Parasite Genotyping via Nanopore Sequencing

Item	Function/Description	Example Product/Kit
Portable Sequencer	Palm-sized device for DNA/RNA sequencing; USB-powered.	MinION [9]
Integrated Device	Portable device with onboard compute for sequencing and analysis.	MinION Mk1C [9]
Library Prep Kit	For barcoding and preparing amplified DNA libraries for sequencing.	Native Barcoding Kit 96 (SQK-NBD114.96) [3]
Flow Cell	Disposable cartridge containing nanopores for sequencing.	R10.4.1 Flow Cell [3]
DNA Extraction Kit	For purifying high-quality genomic DNA from complex samples.	QIAamp UCP Pathogen Mini Kit [12]
Hot-Start PCR Mix	For specific and efficient multiplex amplification of target loci.	KAPA2G Robust HotStart ReadyMix [12]
Bioinformatics Tools	Software for basecalling, demultiplexing, and haplotype analysis.	Dorado basecaller, MinKNOW [3]

Implementation and Best Practices for Field Deployment

Deploying nanopore sequencing in field settings requires careful planning. The following guidelines ensure robust and reliable results:

Workflow Validation: Prior to field deployment, validate the entire workflow—from DNA extraction to final analysis—with well-characterized control samples in the central lab to establish baseline performance [3].
Rapid Turnaround: Leverage real-time analysis to monitor sequencing progress. Experiments can be stopped as soon as sufficient data is collected, drastically reducing time-to-result [9] [3]. One malaria surveillance study achieved a turnaround of under 29 hours from DNA extraction to results [11].
Data Management: The MinION Mk1C offers an integrated solution for sequencing and analysis. For offline environments, ensure all necessary software, databases, and bioinformatics pipelines are installed and tested beforehand [9] [3].
Quality Control: Include both positive and negative controls in every sequencing run. Use defined criteria for read quality (e.g., Q-score ≥20) and assay performance (e.g., minimum coverage per amplicon) to ensure data reliability [3].

Application Notes: The State of Nanopore Sequencing for Parasite Genotyping

Long-read nanopore sequencing is revolutionizing the genotyping of pathogenic parasites, offering solutions to longstanding challenges in genomic surveillance, drug resistance monitoring, and field-based genomics. The technology's portability, real-time sequencing capability, and adaptability to low-resource settings make it particularly valuable for studying two major parasitic pathogens: Plasmodium falciparum, the deadliest malaria parasite, and Schistosoma species, the causative agents of schistosomiasis.

Plasmodium falciparum Genotyping Applications

For P. falciparum, nanopore sequencing has enabled rapid, cost-effective genomic surveillance that is deployable in endemic regions. The applications span multiple critical areas of malaria control:

Drug Resistance Monitoring: Targeted sequencing panels successfully track mutations in genes associated with resistance to key antimalarials, including kelch13 (artemisinin), crt and mdr1 (chloroquine), and dhfr/dhps (sulfadoxine-pyrimethamine) [13] [14].
Vaccine Target Surveillance: Amplification and sequencing of the circumsporozoite protein (csp) gene, the target of RTS,S and R21 vaccines, allows monitoring of potential vaccine escape mutants [13].
Malaria Diagnostics: Detection of deletions in hrp2/3 genes, which cause false negatives in widely used rapid diagnostic tests (RDTs), is possible with customized multiplex PCR panels [14].
Therapeutic Efficacy Studies: Multiplexed amplicon sequencing of microhaplotypes effectively distinguishes recrudescence from new infections in clinical trials, with demonstrated sensitivity for detecting minority clones at frequencies as low as 1:100 in polyclonal infections [3].
Species Identification: The 18S rRNA gene target enables discrimination between Plasmodium species and detection of mixed infections, crucial for appropriate treatment and understanding transmission dynamics [15] [13].

Schistosoma Species Genotyping Challenges

For Schistosoma species, particularly S. mansoni, research has focused on overcoming the challenges of obtaining sufficient parasite DNA from non-invasive samples:

Larval Stage Sequencing: Miracidia preserved on Whatman FTA cards represent the most biologically representative method for genotyping parasites from natural infections, though these samples yield limited DNA and suffer from high contamination [16].
Contamination Management: Washed miracidia samples show higher proportions of S. mansoni DNA compared to unwashed samples, validating the effectiveness of physical washing for contamination removal, though this process is labor-intensive and can limit sample collection [16].
Adaptive Sampling Limitations: Unlike its success with Plasmodium, adaptive sampling has shown limited effectiveness for enriching S. mansoni DNA from contaminated samples, failing to generate sufficient reads for effective whole-genome sequencing in current implementations [16].

Emerging Methodologies and Their Applications

Innovative wet-lab and computational approaches are expanding nanopore applications for parasite genotyping:

Table 1: Key Experimental Approaches for Parasite Genotyping Using Nanopore Sequencing

Approach	Key Parasites	Primary Applications	Sample Input	Enrichment Factor/Performance
Adaptive Sampling	P. falciparum [17]	Whole-genome sequencing without prior enrichment	Unenriched blood	3-5× enrichment for 0.1-8.4% parasitemia
Multiplex Amplicon Sequencing	P. falciparum [3] [13] [14]	Drug resistance, vaccine target, diagnostic marker surveillance	Dried blood spots, venous blood	~97% genome coverage at 0.1% parasitemia
18S rDNA Barcoding	Multiple blood parasites [15]	Species identification, mixed infection detection	Whole blood	Detection of 1-4 parasites/μL
Metagenomic Sequencing	Plasmodium spp. [18]	Comprehensive pathogen detection, species identification	EDTA blood	Positive correlation with parasitemia (Spearman r=0.7307)

Experimental Protocols

Protocol 1: Adaptive Sampling for Plasmodium falciparum Whole-Genome Sequencing

This protocol enables selective enrichment of P. falciparum DNA directly during sequencing, eliminating the need for prior laboratory-based enrichment steps [17].

Principle: Adaptive sampling uses real-time basecalling and sequence alignment to determine whether a DNA fragment should be sequenced to completion or ejected from the pore, thereby enriching for target organisms.

Materials:

Oxford Nanopore sequencer (MinION or PromethION)
Ligation Sequencing Kit (SQK-LSK114)
Flow Cell Wash Kit
Human DNA-depleted blood samples or mixed human-parasite DNA
MinKNOW software (v22.03.4 or higher)

Procedure:

DNA Extraction: Extract genomic DNA from patient blood samples using standard methods. Note: Prior leukocyte depletion is not required.
Library Preparation: Prepare sequencing library using the Ligation Sequencing Kit according to manufacturer's protocol without fragmentation.
Adaptive Sampling Setup:
- In MinKNOW, select "Adaptive Sampling" as the run type
- Upload a BED file containing coordinates for the P. falciparum reference genome (PlasmoDB version 3D7)
- Select "Enrichment" mode and set the reference for alignment
Sequencing: Load the library onto a flow cell and initiate sequencing.
Quality Control: Monitor enrichment efficiency in real-time through MinKNOW reports.

Performance Metrics: For samples with 0.1%-8.4% P. falciparum DNA, expect 3-5× enrichment of P. falciparum bases. A sample with 0.1% parasitemia should achieve ~97% genome coverage at median depth of 5× [17].

Protocol 2: DRAG2 Multiplex Amplicon Sequencing for Malaria Surveillance

The DRAG2 (Drug Resistance + Antigen Multiplex PCR) assay provides comprehensive surveillance of drug resistance markers and vaccine targets [13].

Principle: Targeted amplification of key genomic regions followed by nanopore sequencing enables cost-effective monitoring of multiple genetic markers simultaneously.

Materials:

Multiply software for primer design [14]
Native Barcoding Kit 96V14 (SQK-NBD114.96)
R10.4.1 flow cells
Dried blood spots or venous blood from infected patients
Custom primer pools for DRAG2-A and DRAG2-B reactions

Procedure:

Primer Design: Use multiply software to design multiplex PCR panels targeting genes of interest:
- kelch13, crt, mdr1, dhfr, dhps (drug resistance)
- csp, msp2 (vaccine and diversity markers)
- 18S rRNA (species identification)
DNA Extraction: Extract DNA from dried blood spots or venous blood.
Multiplex PCR:
- Perform two separate multiplex reactions (DRAG2-A and DRAG2-B)
- Use optimized cycling conditions: 98°C for 30s; 35 cycles of 98°C for 10s, 60°C for 30s, 65°C for 4min; final extension at 65°C for 5min
Library Preparation: Barcode and prepare sequencing library using Native Barcoding Kit.
Sequencing: Load onto MinION Mk1C with R10.4.1 flow cells, targeting ~25,000 reads per marker per sample.

Performance Metrics: The assay costs approximately $25 per sample and achieves uniform coverage across targets. It reliably detects SNPs in drug resistance loci with high concordance to Sanger sequencing [13] [14].

Protocol 3: 18S rDNA Barcoding for Multi-Parasite Detection

This protocol enables comprehensive detection of multiple blood parasite species using long-read 18S rDNA barcoding with host DNA suppression [15].

Principle: Universal primers amplify a ~1.2kb region of 18S rDNA from diverse eukaryotic pathogens, while blocking primers specifically inhibit amplification of host DNA.

Materials:

Universal primers F566 and 1776R
Blocking primers: 3SpC3Hs1829R (C3 spacer-modified) and PNAHs412F (peptide nucleic acid)
Blood samples with suspected parasitic infection
Oxford Nanopore portable sequencer

Procedure:

Blocking Primer Design:
- Design C3 spacer-modified oligos competing with universal reverse primer
- Design PNA oligos that inhibit polymerase elongation at host-specific binding sites
PCR with Host Suppression:
- Set up reactions with universal primers and blocking primers
- Use cycling conditions: 94°C for 2min; 40 cycles of 94°C for 30s, 60°C for 30s, 72°C for 2min; final extension at 72°C for 5min
Library Preparation and Sequencing:
- Prepare sequencing library using Ligation Sequencing Kit
- Sequence on MinION platform with real-time basecalling
Bioinformatic Analysis:
- Classify sequences using BLASTn against curated 18S rDNA database
- Apply rigorous cutoff criteria for species assignment

Performance Metrics: Successfully detects Trypanosoma brucei rhodesiense, Plasmodium falciparum, and Babesia bovis in human blood samples spiked with as few as 1, 4, and 4 parasites per microliter, respectively [15].

Workflow Visualization

Diagram 1: Comprehensive workflow for parasite genotyping using nanopore sequencing, highlighting key decision points for method selection based on research objectives.

The Scientist's Toolkit: Essential Research Reagents

Table 2: Key Research Reagent Solutions for Parasite Genotyping Studies

Reagent/Kit	Primary Function	Application Examples	Performance Notes
Ligation Sequencing Kit (SQK-LSK114)	Library preparation for whole-genome sequencing	Adaptive sampling for P. falciparum [17], metagenomic sequencing [18]	Preserves long reads and native modifications; ~1 hour preparation time
Native Barcoding Kit 96V14 (SQK-NBD114.96)	Multiplexed sample preparation	DRAG2 assay [13], multiplexed amplicon sequencing [3]	Enables pooling of up to 96 samples; reduces per-sample cost
R10.4.1 Flow Cells	High-accuracy sequencing	Microhaplotype genotyping [3], SNP calling in drug resistance loci [13]	Improved raw read accuracy for reliable variant calling
Whatman FTA Cards	Sample preservation and storage	Miracidia collection for Schistosoma studies [16], dried blood spots for Plasmodium [14]	Stabilizes DNA at ambient temperature; ideal for field collections
Plasmodipur Filters	Leukocyte depletion	Enrichment of Plasmodium DNA from blood samples [17]	Reduces human DNA background; requires fresh blood samples
Custom Blocking Primers (C3/PNA)	Host DNA suppression	18S rDNA barcoding from blood [15]	Specifically inhibits amplification of host 18S rDNA
Synthetic Plasmid Controls	Quality control and contamination monitoring	DRAG2 assay validation [13]	Contains 'control' SNPs not found in nature to signal contamination

Technical Considerations and Limitations

While nanopore sequencing offers transformative potential for parasite genotyping, several technical constraints require consideration:

Parasitemia Requirements: Adaptive sampling for P. falciparum requires minimum parasitemia of 0.1% for adequate genome coverage [17]. Below this threshold, traditional enrichment methods remain necessary.
Contamination Challenges: For Schistosoma miracidia samples, adaptive sampling currently cannot overcome high contamination levels, making physical washing steps still necessary despite their labor-intensive nature [16].
Bioinformatic Requirements: Accurate species identification from metagenomic data requires optimized parameters, with Kraken/Bracken algorithms showing 50% correct species identification, improvable to 63.6% with Pavian approach with z-scores [18].
Cost Considerations: Multiplexed amplicon sequencing reduces costs to ~$25 per sample for targeted surveillance [14], while whole-genome approaches with adaptive sampling remain more resource-intensive.
Sample Type Limitations: Dried blood spots enable widespread sample collection but yield lower DNA quantity and quality compared to venous blood, potentially affecting coverage uniformity [13].

From Theory to Practice: Methodological Workflows for Parasite Genotyping and Surveillance

Whole-genome sequencing (WGS) of parasites is fundamental to understanding the mechanisms of disease pathogenesis, drug resistance, and immune evasion. The complex genomic architecture of many parasites—characterized by highly repetitive regions, extensive segmental duplications, and dynamic gene families—has historically posed significant challenges for short-read sequencing technologies [2]. The advent and refinement of long-read sequencing (LRS), pioneered by platforms such as Oxford Nanopore Technologies (ONT), are now overcoming these limitations. ONT sequencing enables the generation of contiguous, high-fidelity genomic sequences that can span entire repetitive elements and structural variations, providing unprecedented resolution for parasitic genomics [2] [19]. This capability is particularly valuable for outbreak investigations, where the rapid identification of virulence factors and antimicrobial resistance genes is essential for guiding public health interventions [2]. Furthermore, the portability and real-time data analysis features of platforms like the MinION make advanced genomic surveillance feasible even in resource-limited, endemic settings, potentially transforming the pace and precision of parasite research and control [17] [2].

Key Applications and Advantages in Parasite Research

Resolving Complex Genomic Structures

The long reads generated by nanopore sequencing are uniquely suited to resolve the complex genomic features prevalent in parasitic organisms. For instance, Plasmodium falciparum, the parasite responsible for the most severe form of malaria, possesses a genome with extensive segmental duplications and subtelomeric gene families that are critical for immune evasion [2]. Similarly, the genomes of parasitic trypanosomes are defined by hypervariable and dynamic regions that facilitate antigenic variation, allowing the parasite to evade host immune responses [20]. Short-read technologies often fail to assemble these regions accurately, leaving gaps in genomic understanding. In contrast, a single nanopore read can span an entire repetitive element or multi-copy gene family, enabling phased haplotyping and the correct reconstruction of previously inaccessible genomic loci. This provides a more complete picture of the genetic mechanisms underlying parasite biology [2] [20].

Enhanced Genomic Surveillance and Drug Resistance Monitoring

Genomic surveillance of parasites in endemic regions is critical for tracking transmission dynamics, monitoring the emergence of drug resistance, and detecting deletions in diagnostic marker genes. Nanopore sequencing facilitates this by enabling rapid, on-site sequencing with minimal laboratory infrastructure.

A landmark study by Mwenda et al. (2025) demonstrated the deployment of nanopore sequencing across eight countries in sub-Saharan Africa for the genomic surveillance of Plasmodium falciparum [11]. The researchers utilized dried blood spots (DBS) as a source material, developing a protocol that was low-cost (<$25 per sample), required less than half the pipetting steps of Illumina-based protocols, and delivered results in under 29 hours from DNA extraction to final analysis [11]. This approach successfully identified key drug-resistance mutations and hrp2/3 gene deletions associated with diagnostic test evasion, providing a scalable solution for real-time public health response [11].

Table 1: Performance Metrics from a Continental-Scale Malaria Genomic Surveillance Study [11]

Metric	Result/Description
Samples Processed	1,065 / 1,404 (75.8%) processed within Africa
Cost per Sample	< $25 USD
Turnaround Time	< 29 hours (from DNA extraction to results)
Key Targets	Drug-resistance mutations, hrp2/3 gene deletions
Primary Advantage	Accessible, rapid solution for local monitoring of outbreaks

Targeted Enrichment and Adaptive Sampling

A significant challenge in sequencing parasites from clinical samples is the high proportion of host DNA, which can make parasite DNA a minor component of the total nucleic acid pool. ONT's adaptive sampling feature addresses this problem bioinformatically, enriching for target sequences in real-time during the sequencing run [17]. When a DNA molecule is loaded into a nanopore, its sequence is determined in real-time. If the initial portion of the read is identified as originating from the host (e.g., human) genome, the voltage across the pore can be reversed to eject the molecule, freeing up the pore for another, potentially target, molecule [17].

Research has shown that adaptive sampling can achieve a 3- to 5-fold enrichment of Plasmodium falciparum DNA in samples containing only 0.1%–8.4% parasite DNA [17]. In patient blood samples with parasitemia levels as low as 0.1%, this enrichment was sufficient to cover over 97% of the P. falciparum reference genome and accurately call 38 drug resistance loci with high concordance to Sanger sequencing results [17]. This method presents a powerful tool for enriching parasite DNA without the need for time-consuming laboratory-based enrichment protocols.

4De NovoGenome Assembly and Transcriptome Characterization

For parasites lacking high-quality reference genomes, de novo assembly using long reads is invaluable. Studies on parasitic nematodes, including Brugia malayi, Trichuris trichiura, and Ancylostoma caninum, have demonstrated that de novo assemblies generated using only MinION data exhibit similar or superior contiguity and completeness compared to existing references [21]. Modified protocols have even enabled WGS from single helminth specimens, opening new avenues for researching parasites that are difficult to obtain in large quantities [21].

Beyond the genome, nanopore sequencing also allows for full-length transcriptome characterization. This capability is crucial for identifying novel transcripts and splice variants that may play roles in parasite development and virulence. Although more commonly applied in cancer research, this strength of nanopore sequencing is directly transferable to parasite transcriptomics, promising insights into gene regulation and expression in different parasitic life stages [11].

Detailed Experimental Protocols

Protocol 1: Rapid Genomic Surveillance ofPlasmodium falciparumfrom Dried Blood Spots

This protocol, adapted from Mwenda et al. (2025), is designed for high-throughput, cost-effective surveillance in resource-limited settings [11].

Step 1: Sample Collection and DNA Extraction. Collect patient blood on filter paper to create dried blood spots (DBS). Using a hole punch, transfer a segment of the DBS to a tube for DNA extraction. Use a commercially available DNA extraction kit, with an optional step to optimize the elution volume to ensure sufficient DNA concentration for library preparation.
Step 2: Library Preparation. Utilize the ONT "Rapid" library preparation kit (e.g., SQK-RBK114.96). This kit is ideal for this application as it involves minimal pipetting steps (reduced by more than half compared to Illumina protocols), reduces hands-on time, and keeps costs low (under $25 per sample) [11]. Following the manufacturer's guidelines, bind the DNA to beads, wash, and then elute in the provided buffer. Add the rapid barcodes to allow for multiplexing of up to 96 samples per flow cell.
Step 3: Sequencing and Real-Time Analysis. Load the pooled library onto a MinION flow cell (R9.4.1 or newer). Start the sequencing run and initiate real-time analysis software. For targeted analysis of drug-resistance markers and diagnostic gene deletions, a custom workflow can be used to basecall, demultiplex, and align reads in real-time, providing actionable results within 24 hours [11].

Protocol 2: Enrichment of Parasite DNA via Adaptive Sampling

This protocol is for sequencing directly from patient blood samples where host DNA depletion is not performed wet-lab [17].

Step 1: DNA Extraction and Quality Control. Extract high-molecular-weight DNA from a patient blood sample using a method that preserves long fragment length (e.g., Qiagen MagAttract HMW DNA Kit). Quantify the DNA and assess fragment size distribution using a Fragment Analyzer or agarose gel electrophoresis. The average fragment size should ideally be above 20 kb.
Step 2: Library Preparation for Adaptive Sampling. Prepare a sequencing library using the ONT Ligation Sequencing Kit (SQK-LSK109). This kit is recommended because it preserves the native length of the DNA fragments, which is crucial for the efficiency of adaptive sampling.
Step 3: Sequencing with Adaptive Sampling. Load the library onto the flow cell. In the sequencing control software (MinKNOW), enable the "adaptive sampling" feature. Upload the reference genome for the parasite (e.g., Plasmodium falciparum 3D7) and the host (e.g., human GRCh38). Configure the software to enrich for reads mapping to the parasite genome and deplete reads mapping to the host genome. The sequencing run will proceed, with human reads being ejected early (typically after ~400 bases) to prioritize the sequencing of parasite DNA [17].

Table 2: Comparison of Two Key Parasite WGS Workflows

Aspect	Rapid DBS Surveillance [11]	Adaptive Sampling from Blood [17]
Input Material	Dried Blood Spots (DBS)	Whole blood (with high molecular weight DNA)
Best For	High-throughput, cost-effective field surveillance	Samples with low parasitemia where wet-lab enrichment is not desired
Key Benefit	Low cost, simple workflow, high portability	In silico enrichment; avoids laboratory steps for host DNA depletion
Typical Enrichment	N/A (relies on high multiplexing)	3- to 5-fold enrichment of parasite DNA
Parasitemia Range	Not specified	Effective from 0.1% and higher

Protocol 3:De NovoGenome Assembly from a Single Parasite

This protocol, inspired by the work on helminths, is useful for generating reference genomes for novel parasite species or strains [21].

Step 1: Single Parasite DNA Extraction. Isolate a single parasite specimen and wash it thoroughly in buffer to remove contaminating material. Use a modified DNA extraction protocol that includes a mechanical lysis step (e.g., bead beating) to ensure complete disruption of the parasite's tough outer structures, followed by enzymatic purification. The goal is to maximize DNA yield from a single organism.
Step 2: Library Preparation for Ultra-Long Reads. To achieve the best possible assembly, aim for ultra-long reads. Use a library preparation kit designed for long fragments (e.g., ONT Ligation Sequencing Kit) and avoid any shearing or fragmentation steps. Size selection can be performed to enrich for the longest fragments.
Step 3: Sequencing and Assembly. Sequence the DNA on a MinION or PromethION flow cell to generate high-coverage data. For assembly, use long-read-specific assemblers such as Flye or Canu [22] [21]. The resulting assembly will consist of a small number of highly contiguous contigs. Polish the initial assembly using the same long-read data with tools like Medaka to improve base-level accuracy.

The Scientist's Toolkit: Essential Reagents and Materials

Table 3: Key Research Reagent Solutions for Parasite WGS

Item	Function/Description	Example Product/Catalog
Dried Blood Spot (DBS) Cards	Stable, simple sample collection and storage from finger-prick or venous blood.	Whatman 903 Protein Saver Card
Rapid Barcoding Kit	Fast, minimal-step library prep for multiplexing up to 96 samples; ideal for field surveillance.	Oxford Nanopore SQK-RBK110.96
Ligation Sequencing Kit	High-quality library prep for long and ultra-long reads; required for adaptive sampling and genome assembly.	Oxford Nanopore SQK-LSK109
Native Barcoding Kit	Allows multiplexing of samples prepared with the Ligation Sequencing Kit.	Oxford Nanopore EXP-NBD104/114
MinION Flow Cell (R10.4.1)	The consumable containing nanopores; R10.4.1 offers improved basecalling accuracy.	Oxford Nanopore FLO-MIN114
Flye Assembler	Bioinformatics software for de novo genome assembly from long reads.	https://github.com/fenderglass/Flye
Dorado Basecaller	ONT's optimized software for converting raw electrical signal to nucleotide sequence.	https://github.com/nanoporetech/dorado

Workflow and Pathway Visualizations

Adaptive Sampling for Parasite Enrichment

Comprehensive Parasite Whole-Genome Sequencing Workflow

Targeted Amplicon Sequencing (AmpSeq) for High-Sensitivity Resistance Profiling

Targeted Amplicon Sequencing (AmpSeq) has emerged as a powerful methodology for high-throughput genomic surveillance of parasitic diseases, particularly for profiling drug resistance markers in Plasmodium falciparum. This technique utilizes multiplex PCR to selectively amplify specific genomic regions of interest, followed by high-throughput sequencing on platforms such as Oxford Nanopore Technologies (ONT) [3] [13]. AmpSeq addresses critical limitations of traditional genotyping methods, including the amplification bias associated with length-polymorphic markers like msp1, msp2, and glurp, which can lead to preferential amplification of shorter fragments and loss of longer alleles in multi-clonal infections [23]. The method's exceptional sensitivity enables detection of minority clones in polyclonal infections at frequencies as low as 0.1%–1%, providing crucial insights into parasite dynamics that were previously undetectable [3] [23].

Within the context of long-read nanopore sequencing, AmpSeq offers distinct advantages for parasite genotyping research, including portability, real-time sequencing capabilities, and relatively low operational costs [13]. These characteristics make it particularly suitable for deployment in endemic settings, where rapid genomic surveillance can inform treatment policies and containment strategies. The integration of AmpSeq with nanopore technology represents a significant advancement in monitoring antimalarial drug resistance, detecting emerging vaccine escape mutants, and distinguishing recrudescence from new infections in therapeutic efficacy studies [3] [13].

Performance Characteristics and Applications

Quantitative Performance Metrics of AmpSeq Assays

The utility of AmpSeq for resistance profiling hinges on its robust performance characteristics across various parasite densities and sample types. The table below summarizes key performance metrics from recent AmpSeq assays developed for P. falciparum genotyping.

Table 1: Performance Characteristics of Targeted AmpSeq Assays for Malaria Parasites

Assay Name	Sensitivity (Parasite Density)	Minority Clone Detection	Markers	Primary Application	Reference
SIMPLseq	100% locus detection at ≥0.5 parasites/μL	50% average locus detection at 0.125-0.25 parasites/μL	6-plex	High-sensitivity genotyping, infection endpoints	[24]
Nanopore AmpSeq (Microhaplotypes)	High sensitivity across natural parasitemia (31-33,930 parasites/μL)	1:100:100:100 in strain mixtures	6 microhaplotypes	Distinguishing recrudescence from new infection	[3]
DRAG2	Effective across venous blood and dried blood spots	Not specified	9 targets (drug resistance + antigens)	Expanded drug resistance and species surveillance	[13]
Illumina AmpSeq	Reproducible across sample types	1:100 dilution in control mixtures	5 SNP-rich markers	PCR-correction in clinical trials	[23]

The high sensitivity of modern AmpSeq assays enables reliable genotyping even at very low parasite densities, which is crucial for accurate classification of recurrent infections in antimalarial drug trials [24]. The SIMPLseq assay demonstrates exceptional performance, maintaining 100% average locus detection at densities as low as 0.5 parasites/μL, with some detection capability extending to 0.125 parasites/μL [24]. This sensitivity is complemented by high specificity, with false-positive haplotypes reported below 0.01% in well-optimized assays [3].

Applications in Antimalarial Drug Efficacy Studies

AmpSeq has proven particularly valuable in therapeutic efficacy studies (TES) for distinguishing recrudescent (true treatment failure) from new infections. Traditional methods using capillary electrophoresis for length-polymorphic markers face limitations in detecting complex polyclonal infections and minority clones [23]. In a direct comparison, AmpSeq demonstrated superior performance, with discordance between markers in only six patients compared to eleven patients with length-polymorphic markers [23]. The nanopore AmpSeq assay targeting six microhaplotype loci consistently distinguished recrudescence from new infections in 17 out of 20 cases (85%) for all six markers, highlighting its reliability for drug efficacy evaluations [3].

The method's capacity to detect minority clones is vital for understanding the complex dynamics of parasite populations under drug pressure. In controlled experiments with laboratory strain mixtures, AmpSeq reliably identified minority clones at ratios of 1:100:100:100 (3D7:K1:HB3:FCB1 strains), demonstrating sufficient sensitivity to detect emerging resistant subpopulations before they become dominant [3]. This capability is increasingly important with the spread of artemisinin partial resistance (ART-R) mediated by kelch13 mutations, particularly as these mutations emerge in African regions where the malaria burden is highest [3].

Research Reagent Solutions

The successful implementation of AmpSeq for resistance profiling depends on carefully selected research reagents and materials. The following table outlines essential components for establishing a robust AmpSeq workflow.

Table 2: Essential Research Reagents for AmpSeq-Based Resistance Profiling

Reagent Category	Specific Examples	Function in Workflow	Considerations
Polymerase	Q5 High-Fidelity DNA Polymerase	Amplification of target regions with minimal errors	High fidelity crucial for accurate variant calling; can be customized [25]
Primer Panels	DRAG2 (9-plex), SIMPLseq (6-plex), Microhaplotype (6-plex)	Target-specific amplification	Designed for uniform amplification efficiency; contain drug resistance markers [3] [13] [24]
Sequencing Kit	ONT Native Barcoding Kit 96 V14	Library preparation and barcoding	Enables multiplexing of samples; compatible with MinION platform [3]
Sequencing Platform	MinION Mk1C with R10.4.1 flow cells	Portable, real-time sequencing	Suitable for field deployment; R10.4.1 chemistry improves accuracy [3]
Positive Controls	Synthetic plasmids with "control" SNPs	Quality control and contamination monitoring	Engineered with unnatural SNPs to signal contamination if detected [13]
DNA Extraction Kit	Qiagen DNeasy Tissue and Blood kit	Nucleic acid purification from clinical samples	Effective with various sample types including dried blood spots [26] [13]

The selection of high-fidelity DNA polymerase is particularly critical for minimizing amplification errors that could be misinterpreted as genuine polymorphisms [25]. Primer panels should be designed to target the most informative regions for resistance profiling, with assays like DRAG2 incorporating key drug resistance markers (crt, dhfr, dhps, mdr1, kelch13) alongside antigenic targets (csp, msp2) for comprehensive surveillance [13]. The inclusion of synthetic plasmids as positive controls provides an economical quality control measure, with engineered "control" SNPs that serve as indicators of contamination if detected in clinical samples [13].

Experimental Protocol for AmpSeq Resistance Profiling

Sample Preparation and Quality Control

Begin with genomic DNA extraction from patient samples, which may include venous blood or dried blood spots (DBS). For Plasmodium samples, use approximately 5 μL of genomic DNA at 4 ng/μL concentration [27]. Assess DNA quality and concentration using fluorometric methods (e.g., Qubit Fluorometer) and agarose gel electrophoresis [27]. To exclude maternal cell contamination in prenatal applications or cross-contamination in field samples, perform short tandem repeat (STR) analysis using commercially available systems [27]. For parasite samples, determine parasitemia by microscopy or qPCR to establish expected DNA yield [3].

Multiplex PCR Amplification

Perform multiplex PCR reactions in a 20 μL reaction system containing:

5 μL genomic DNA (4 ng/μL)
2 μL index-containing primer (M-primer)
3 μL UPD primer pool or target-specific primer mix
10 μL of 2× multiplex PCR mix [27]

Utilize optimized thermal cycling conditions:

Initial denaturation: 95°C for 2 minutes
20 cycles of:
- Denaturation: 95°C for 30 seconds
- Annealing/extension: 60°C for 4 minutes
Final extension: 72°C for 5 minutes [27]

For assays requiring higher sensitivity, such as SIMPLseq, a two-step PCR approach is recommended, with well-specific inline barcodes incorporated during the first-round PCR to track potential contamination [24]. Primer pools should be balanced to ensure uniform amplification efficiency across targets, with amplicon sizes kept similar (e.g., 459-975 bp in the DRAG2 assay) to minimize size-based amplification bias [13].

Library Preparation and Nanopore Sequencing

Purify PCR products using magnetic beads to remove primers and non-specific amplification products. For nanopore sequencing, prepare libraries using the ONT Native Barcoding Kit 96 V14 according to manufacturer's instructions with minor modifications [3]. Load libraries onto R10.4.1 flow cells and sequence on the MinION Mk1C platform using MinKNOW software (v24.06.15 or later). Aim for approximately 25,000 reads per marker per sample, or 150,000 reads total, to compensate for downstream filtering of low-quality reads [3]. Sequence until the desired coverage is achieved, typically requiring 4-24 hours depending on sample multiplexing level and parasite density.

Bioinformatic Analysis and Haplotype Calling

Process raw sequencing data through a standardized bioinformatics pipeline:

Basecalling and demultiplexing: Use Dorado (v0.8.2) for simplex basecalling with super-accurate (sup) model, minimum Q-score of 20 (accuracy ≥99%) [3]
Adapter trimming: Remove adapters and low-quality sequences using Cutadapt (v1.10) [27]
Alignment: Map reads to reference genome (e.g., GRCh37 for human, PlasmoDB for Plasmodium) using Burrows-Wheeler Aligner (BWA-MEM) [27]
Variant calling: Identify SNPs using specialized tools like VarScan (v2.4.3) [27] or HaplotypR [23]

Apply stringent cut-off thresholds for haplotype calling to balance sensitivity and specificity. Established parameters include:

Minimum of 10 reads per haplotype
Minimum of 50 reads per sample
1% minimum within-host frequency for minority clones [23] For regulatory trials, employ conservative thresholds that may overestimate rather than underestimate treatment failure rates to ensure patient safety [23].

Table 3: Quality Control Parameters for AmpSeq Data Analysis

QC Parameter	Threshold	Purpose	Consequence if Not Met
Read Quality Score	Q20 (≥99% accuracy)	Filter low-quality reads	Exclude from analysis to reduce errors
Minimum Coverage	50 reads/sample	Ensure statistical reliability	Flag sample for potential re-sequencing
Haplotype Frequency	1% minimum	Detect minority clones	May miss low-frequency resistant variants
Replicate Concordance	≥2/3 replicates	Confirm haplotype validity	Exclude as potential PCR artefact
Negative Controls	Zero contamination	Monitor cross-contamination	Investigate source and re-run if contaminated

Workflow Visualization

The following diagram illustrates the complete AmpSeq workflow for high-sensitivity resistance profiling, from sample preparation to data analysis:

Targeted Amplicon Sequencing represents a transformative approach for high-sensitivity resistance profiling in parasite genotyping research. When integrated with long-read nanopore sequencing technology, AmpSeq provides a powerful tool for tracking antimalarial drug resistance, distinguishing recrudescence from new infections, and detecting minority clones that may represent emerging resistant subpopulations. The methodology offers significant advantages over traditional genotyping techniques, including superior sensitivity, reduced amplification bias, and the capacity for high-throughput implementation in endemic settings. As resistance markers continue to evolve and spread, particularly in high-transmission regions, AmpSeq will play an increasingly vital role in informing treatment policies and containment strategies, ultimately contributing to more effective malaria control and elimination efforts.

In antimalarial drug trials, a critical challenge lies in distinguishing between recrudescence (true treatment failure where the original infection persists) and new infections (acquired from a new mosquito bite) when patients present with recurrent parasitemia [3]. This distinction is vital for calculating accurate, genotype-corrected efficacy estimates, which form the primary outcome of Therapeutic Efficacy Studies (TES) mandated by the World Health Organization (WHO) [3]. Conventional methods using capillary electrophoresis to genotype size-polymorphic markers like msp1, msp2, and microsatellites present limitations in resolution and throughput [3]. The emergence of artemisinin-resistant Plasmodium falciparum parasites and subsequent treatment failure of artemisinin-based combination therapies (ACTs) elevates this from a methodological concern to an urgent public health priority, particularly with the recent independent emergence of resistance in East and Horn of Africa [3].

Multiplex PCR panels, particularly those leveraging long-read nanopore sequencing, represent a paradigm shift in addressing this challenge. Unlike traditional methods, these panels simultaneously amplify multiple, highly polymorphic genetic loci, enabling high-resolution strain typing [3]. The integration of nanopore technology (Oxford Nanopore Technologies, ONT) offers a portable, scalable, and rapid sequencing solution that is particularly suited for deployment in resource-limited, endemic settings [3] [28]. This protocol details the application of a nanopore-sequenced multiplex amplicon sequencing (AmpSeq) panel for precise molecular correction in clinical trials.

Experimental Design and Workflow

The core objective of the protocol is to genetically compare parasite populations from two time points: the day zero (D0) sample, collected before treatment initiation, and the sample collected on the day of recurrent parasitemia. The workflow, from sample to analysis, is designed for robustness and efficiency.

Workflow Visualization

The following diagram illustrates the complete experimental and bioinformatics workflow for distinguishing recrudescence from new infections:

Sample Preparation and DNA Extraction

Principle: Obtain high-quality P. falciparum genomic DNA from paired patient whole blood samples.

Sample Type: Frozen whole blood samples (paired D0 and recurrence) from clinical trial participants [3].
Parasitemia Range: The protocol has been validated for samples with parasitemia ranging from 31 to 33,930 parasites/μL [3].
Ethical Considerations: All clinical trial procedures must receive prior approval from relevant Independent Ethics Committees or Institutional Review Boards. Informed consent must be obtained from all subjects or their legal guardians [3].
DNA Extraction: Standard commercial kits for genomic DNA extraction from whole blood, such as the Qiagen EZ1 DNA Tissue kit, are suitable [28]. Ensure DNA is eluted in a low-EDTA or EDTA-free buffer to be compatible with subsequent enzymatic steps.

The Scientist's Toolkit: Key Research Reagent Solutions

The following table catalogues the essential reagents and materials required to execute the nanopore AmpSeq genotyping protocol successfully.

Table 1: Essential Research Reagents and Materials for Nanopore AmpSeq Genotyping

Item Name	Function/Application	Specifications/Notes
Native Barcoding Kit 96 V14 (SQK-NBD114.96) [3]	Prepares multiplexed libraries for nanopore sequencing by attaching unique barcodes to each sample.	Enables pooling of up to 96 samples per sequencing run, optimizing cost and throughput.
R10.4.1 Flow Cells [3]	The consumable containing nanopores for sequencing.	Latest chemistry at time of publication; provides improved basecalling accuracy, crucial for SNP calling.
Multiplex PCR Primer Pool [3]	Simultaneously amplifies the six target microhaplotype loci from genomic DNA.	Contains published primer sequences for ama1, celtos, cpmp, csp, cpp, and surfin1.1 [3].
Phusion High-Fidelity PCR Master Mix [28]	Amplifies target regions with high fidelity and yield.	Essential for minimizing PCR-introduced errors that could confound haplotype calling.
MinION Mk1C Sequencer [3]	The portable sequencing device that performs the sequencing run.	Integrates compute and MinION sequencer, allowing for standalone operation in the field.
Dorado Basecaller [3]	Converts raw electrical signal data from the sequencer into nucleotide sequences (FASTQ files).	Use "super-accurate (sup)" model with minimum Q-score of 20 for high accuracy (≥99%).

Multiplex PCR Amplification of Microhaplotype Loci

Principle: Amplify multiple, short, and highly polymorphic genomic regions in a single reaction to generate sufficient material for sequencing and enable high-resolution strain discrimination.

Panel Composition and Selection Criteria

The optimized 6-plex PCR panel targets the following microhaplotype loci: ama1, celtos, cpmp, cpp, csp, and surfin1.1 [3]. These loci were selected based on:

High Genetic Diversity: They exhibit high heterozygosity (He), with cpmp being the most diverse (He=0.99, 28 unique haplotypes) [3].
Discriminatory Power: The abundance of Single Nucleotide Polymorphisms (SNPs) in these short loci greatly improves the ability to distinguish between different parasite strains [3] [28].
Amplification Efficiency: The loci are amenable to multiplexing and demonstrate uniform amplification efficiency, which is critical for balanced read coverage [3].

PCR Protocol and Cycling Conditions

The multiplex PCR should be performed using a high-fidelity DNA polymerase to minimize errors.

Reaction Volume: 10-50 μL, scalable based on needs.
DNA Input: 2-10 μL of extracted genomic DNA.
Primer Pool: Use optimized concentrations for each primer pair in the 6-plex pool to ensure uniform coverage (specific concentrations are available in the protocol's supplementary tables) [3].
Cycling Conditions:
- Initial Denaturation: 98°C for 3 minutes.
- Amplification Cycles (40 cycles):
  - Denaturation: 98°C for 10 seconds
  - Annealing: 60°C for 30 seconds
  - Extension: 72°C for 15 seconds
- Final Extension: 72°C for 5 minutes [28].
Post-PCR Purification: Clean amplified products using a magnetic bead-based purification system (e.g., Agencourt AMPure XP beads) to remove primers, enzymes, and salts [28].

Library Preparation, Sequencing, and Bioinformatics

Library Preparation and Sequencing

The purified multiplex PCR amplicons are processed for nanopore sequencing.

Library Preparation Kit: Native Barcoding Kit 96 V14 (SQK-NBD114.96) is used according to the manufacturer's instructions with minor modifications [3]. A detailed laboratory protocol is available online at protocols.io [3].
Sequencing Device: MinION Mk1C with MinKNOW software.
Flow Cell Type: R10.4.1.
Sequencing Depth: Target approximately 25,000 reads per marker per sample (150,000 reads total per sample) to ensure sufficient depth for sensitive detection of minority clones [3].
Controls: Include negative controls (nuclease-free water taken through the entire workflow) and a positive control (e.g., a known parasite strain like FCB1) in each run for quality assurance [3].

Bioinformatic Analysis and Haplotype Calling

A custom bioinformatics pipeline is used to infer haplotypes from the raw sequencing data with high confidence.

Basecalling and Demultiplexing: Raw signals are basecalled and barcodes are assigned using Dorado (v0.8.2) with the "super-accurate" model and a minimum Q-score of 20 [3].
Haplotype Inference: A rigorous inferential method is applied to call haplotypes, including minority clones in polyclonal infections, using established cutoff criteria [3]. This involves:
- Aligning reads to reference sequences for each locus.
- Identifying SNP patterns to reconstruct distinct haplotypes.
- Applying frequency thresholds to filter out false positives (maintained at <0.01%) [3].
Output: A list of haplotypes and their relative frequencies for each sample and each locus.

Performance Metrics and Data Interpretation

Analytical Performance of the Nanopore AmpSeq Assay

The optimized assay demonstrates performance characteristics that meet the stringent requirements for clinical trial genotyping.

Table 2: Analytical Performance Metrics of the Nanopore AmpSeq Assay

Performance Parameter	Result	Experimental Basis
Sensitivity for Minority Clones	Detects clones at ratios as low as 1:100:100:100 (minority:majority) [3].	Testing with defined mixtures of four P. falciparum lab strains (3D7, K1, HB3, FCB1).
Specificity (False Positive Rate)	< 0.01% for false-positive haplotypes [3].	Analysis of negative controls and haplotype calling in complex mixtures.
Reproducibility (Accuracy)	Intra-assay: 98%; Inter-assay: 97% [3].	Concordance of haplotype calls across technical replicates and different sequencing runs.
Read Coverage	Uniform and high coverage across all 6 markers in both lab strains and patient samples [3].	Assessment of read depth distribution per locus per sample.
Genetic Diversity of Markers	Highest for cpmp (He=0.99, 28 haplotypes) [3].	Analysis of haplotype diversity in a natural parasite population.

Interpretation of Results for Clinical Endpoints

Principle: Compare the haplotype profiles between the D0 and recurrence samples for each patient.

Recrudescence: Defined by the presence of one or more identical haplotypes at one or more loci in both the D0 and recurrence samples. This indicates the original infection was not cleared by the treatment [3].
New Infection: Defined by the complete absence of shared haplotypes between the D0 and recurrence samples across all genotyped loci. This indicates the recurrence was caused by a genetically distinct parasite [3].
Mixed Infection: In some cases, the recurrence sample may contain a mixture of recrudescent and new infection haplotypes.

Application in Trial Analysis: The nanopore AmpSeq assay consistently distinguished recrudescence from new infections in 17 out of 20 (85%) paired patient samples when data from all six markers were considered [3]. This provides a rapid, corrected estimate of drug failure, which is the cornerstone for reporting therapeutic efficacy to regulatory bodies like the WHO.

The multiplex PCR panel coupled with nanopore sequencing represents a significant advancement over traditional genotyping methods for antimalarial clinical trials. Its high sensitivity, specificity, and robustness, combined with the portability and speed of the ONT platform, make it an ideal tool for obtaining rapid, genotype-corrected drug efficacy estimates. This is particularly crucial for monitoring the spread of antimalarial drug resistance in endemic settings. By providing a detailed protocol and demonstrating its rigorous performance metrics, this application note empowers researchers to implement this powerful methodology, ultimately contributing to more accurate assessments of antimalarial drug efficacy and improved public health outcomes.

Adaptive sampling is a powerful computational enrichment technique available on Oxford Nanopore Technologies (ONT) sequencing platforms that enables real-time, in silico selection of DNA molecules during the sequencing process. Unlike physical enrichment methods that require additional laboratory steps, adaptive sampling performs enrichment computationally by rejecting uninteresting DNA sequences from nanopores as sequencing occurs [2] [29]. This method is particularly valuable for parasite genotyping research, where target organisms like Plasmodium falciparum often constitute a small fraction of the total DNA in clinical samples, making genomic surveillance challenging and costly [30].

The fundamental principle of adaptive sampling involves the real-time analysis of the initial 200-500 base pairs of each DNA read as it enters a nanopore. This sequence "prefix" is basecalled and compared against a reference database of target sequences. Based on this comparison, the software decides whether to continue sequencing the molecule or to eject it from the pore by reversing the voltage, thereby freeing up the nanopore for more valuable molecules [2] [31]. This dynamic, data-driven approach allows researchers to focus sequencing resources on genomic regions of interest, significantly improving the efficiency and cost-effectiveness of studying challenging samples where target DNA is scarce or overwhelmed by host genetic material.

Technical Foundations and Workflow

The adaptive sampling process integrates seamlessly with the standard nanopore sequencing workflow, requiring minimal modifications to library preparation while leveraging specialized software components for real-time decision making.

Core Components and Mechanism

The adaptive sampling system relies on three key technological elements working in concert:

Real-time basecalling: As DNA strands enter nanopores, the ionic current signals are immediately converted to nucleotide sequences using recurrent neural network algorithms. ONT's Dorado basecaller implements bi-directional recurrent neural networks that achieve high accuracy (Q20+), with specialized models like Super Accurate (SUP) providing the precision needed for reliable sequence identification [5].
Reference-based classification: The basecalled sequence prefixes are rapidly aligned against user-defined reference sequences using optimized mapping tools. MinKNOW's implementation utilizes minimap2 for this purpose, while alternative tools like ReadBouncer employ k-mer-based pseudo-mapping with interleaved Bloom filters for classification [31].
Voltage-mediated rejection: When a read is classified as non-target, the software applies a brief voltage reversal to eject the molecule from the pore, typically within 0.5-2 seconds of sequencing initiation. This rapid decision-making minimizes time spent on unwanted sequences while preserving pore availability for target molecules [31].

Workflow Implementation

The following diagram illustrates the complete adaptive sampling workflow, from sample preparation through data analysis:

Figure 1: Adaptive Sampling Workflow. The process integrates wet-lab preparation with real-time computational decisions to enrich target sequences.

Software Tools for Adaptive Sampling

Multiple software options implement adaptive sampling with different algorithmic approaches:

Table 1: Comparison of Adaptive Sampling Implementation Tools

Tool	Classification Method	Key Features	Use Cases
MinKNOW Integrated	Read mapping via minimap2	Seamless integration, no additional software	General purpose target enrichment
ReadBouncer	K-mer matching with interleaved Bloom filters	Fast classification, reduced computational load	High-throughput applications
UNCALLED	Dynamic time warping of raw signals	No basecalling required, potentially faster	Resource-constrained environments
SquiggleNet/DeepSelectNet	Deep learning on raw signals	Potential for higher accuracy with training	Specialized applications requiring maximal accuracy

Research by [31] demonstrated that basecalling-assisted tools (MinKNOW and ReadBouncer) generally provide higher classification accuracy compared to signal-based approaches, making them preferable for most parasite genotyping applications.

Application to Malaria Parasite Genotyping

Adaptive sampling offers particular advantages for malaria research, where Plasmodium falciparum genomes are often outnumbered by human host DNA in clinical samples. The technology enables targeted sequencing of parasite-specific genomic regions without physical separation methods, which can be labor-intensive and introduce biases.

Enrichment Performance and Efficiency

The effectiveness of adaptive sampling for enriching low-abundance targets has been quantitatively demonstrated in multiple studies:

Table 2: Quantitative Performance of Adaptive Sampling for Target Enrichment

Study	Sample Type	Target	Enrichment Factor	Key Metrics
Plasmid Enrichment [31]	Bacterial isolates	Plasmids	6.7x	Increased plasmid abundance from 3.68% to 24.75%
Malaria Surveillance [30]	Dried blood spots	P. falciparum drug resistance genes	~20x	Cost: <$25/sample, turnaround: <29 hours
Respiratory Pathogens [32]	Clinical samples	Microbial pathogens	N/A	Detected 42 additional pathogens missed by standard tests
Paediatric Cancer [32]	Tumour samples	380 cancer genes	~165x on-target coverage	Identified 95% of known fusions and 94% of SNVs

In one notable application, researchers used adaptive sampling for genomic surveillance of Plasmodium falciparum in sub-Saharan Africa, processing over 1,000 dried blood spots across eight countries [11]. The method successfully detected drug-resistance mutations and diagnostic test-evading gene deletions with high accuracy, demonstrating its utility in resource-limited settings where rapid, local genomic surveillance is most needed.

Wet-Lab Protocol for Targeted Malaria Genotyping

This protocol describes an optimized workflow for targeted sequencing of Plasmodium falciparum from dried blood spots using adaptive sampling, adapted from the NOMADS (NMEC-Oxford Malaria Amplicon Drug-resistance Sequencing) approach [30].

Sample Preparation and DNA Extraction

Materials:

Dried blood spot (DBS) samples on filter paper
QIAamp DNA Blood Mini Kit (Qiagen) or similar
Phosphate-buffered saline (PBS)
Proteinase K
Elution buffer (10 mM Tris-HCl, pH 8.5)

Procedure:

Punch 3-6 mm discs from DBS using a sterile hole punch.
Place discs in a 1.5 mL microcentrifuge tube and add 180 µL of PBS.
Incubate at room temperature for 30 minutes with occasional vortexing.
Add 25 µL of Proteinase K and 200 µL of AL buffer, then vortex thoroughly.
Incubate at 56°C for 30 minutes, then at 95°C for 10 minutes.
Add 200 µL of ethanol (96-100%) and mix by vortexing.
Transfer the mixture to a QIAamp Mini spin column and centrifuge at 8,000 × g for 1 minute.
Wash with 500 µL AW1 buffer, centrifuge at 8,000 × g for 1 minute.
Wash with 500 µL AW2 buffer, centrifuge at 14,000 × g for 3 minutes.
Elute DNA with 50-100 µL elution buffer, incubate at room temperature for 5 minutes, then centrifuge at 8,000 × g for 1 minute.
Quantify DNA using fluorometry (Qubit dsDNA HS Assay Kit).

Selective Whole Genome Amplification (Optional)

For samples with very low parasitemia (<100 parasites/µL), a reduced-volume selective whole genome amplification (sWGA) is recommended to enrich for Plasmodium DNA while minimizing human background amplification.

Materials:

Plasmodium-selective sWGA primers [30]
REPLI-g Single Cell Kit (Qiagen)
Nuclease-free water

Procedure:

Prepare sWGA reaction mix:
- 1× REPLI-g SC Reaction Buffer
- 40 µM Plasmodium-selective primer pool
- 5-10 ng extracted DNA
- Nuclease-free water to 10 µL
Denature at 95°C for 5 minutes, then hold at 4°C.
Add 10 µL REPLI-g SC Enzyme Mix, mix gently.
Incubate at 30°C for 4-6 hours, then 65°C for 5 minutes to inactivate the enzyme.
Purify amplified DNA using AMPure XP beads (0.8× ratio) and elute in 25 µL elution buffer.

Library Preparation and Sequencing with Adaptive Sampling

Materials:

Ligation Sequencing Kit (SQK-LSK114)
Native Barcoding Expansion (EXP-NBD114)
Flow Cell (R10.4.1 or newer)
MinION or PromethION sequencer

Procedure:

Prepare DNA library according to ONT's protocol for ligation sequencing with barcoding.
During sequencing setup in MinKNOW, select "Adaptive Sampling" and upload reference files.
Configure adaptive sampling parameters:
- Reference file: FASTA containing target sequences (e.g., drug resistance genes, vaccine targets)
- BED file: Genomic coordinates of regions to enrich (optional)
- Classification strategy: "Readfish" with "enrich" mode for targets
Start the sequencing run with standard parameters for your flow cell type.
Monitor enrichment efficiency in real-time through MinKNOW's live analysis dashboard.

Bioinformatic Analysis Pipeline

Following sequencing, a specialized bioinformatic workflow processes the enriched data:

Figure 2: Bioinformatic Analysis Pipeline for processed data from adaptive sampling experiments.

Key Analysis Steps:

Basecalling: Process raw signals using Dorado with Super Accurate model (sup) for highest accuracy [5].
Quality Control: Filter reads by quality (Q-score >10) and length using NanoPlot and NanoFilt.
Alignment: Map reads to reference genome using minimap2 with parameters optimized for Plasmodium (high A-T content).
Variant Calling: Identify SNPs and indels using ClairS or DeepVariant with Plasmodium-trained models.
Structural Variant Detection: Call large deletions (e.g., hrp2/3) using Sniffles or CuteSV.
Haplotype Analysis: Resolve complex infections using tools like DEploid for polyclonal samples.

Research Reagent Solutions

Successful implementation of adaptive sampling requires careful selection of reagents and tools optimized for parasite genotyping applications.

Table 3: Essential Research Reagents and Tools for Adaptive Sampling

Category	Specific Product/Kit	Function	Parasite Genotyping Considerations
DNA Extraction	QIAamp DNA Blood Mini Kit	Isolation of high-quality DNA from blood samples	Optimized for low parasitemia samples; effective with dried blood spots
Whole Genome Amplification	REPLI-g Single Cell Kit	Whole genome amplification of limited DNA	Selective primers improve Plasmodium enrichment over human DNA
Library Preparation	Ligation Sequencing Kit (SQK-LSK114)	Preparation of sequencing libraries	Preserves long fragments; compatible with adaptive sampling
Barcoding	Native Barcoding Expansion (EXP-NBD114)	Sample multiplexing	Enables pooling of multiple samples; reduces per-sample cost
Flow Cells	R10.4.1 flow cells	Sequencing platform	Improved homopolymer accuracy; better for AT-rich Plasmodium genome
Basecalling	Dorado basecaller SUP model	Signal to sequence conversion	High accuracy needed for reliable variant calling
Analysis Tools	multiply software	Multiplex PCR design	Enables custom panel design for specific research questions

Adaptive sampling represents a significant advancement for in silico enrichment in challenging samples, particularly for parasite genotyping research where target DNA is often scarce and overwhelmed by host genetic material. By enabling real-time, computational selection of DNA molecules during sequencing, this method provides researchers with a powerful tool to focus sequencing resources on genomic regions of interest without additional wet-lab steps.

The application of adaptive sampling to malaria research has demonstrated substantial benefits, including enhanced detection of drug-resistance mutations, identification of diagnostic test-evading gene deletions, and improved cost-effectiveness for genomic surveillance in resource-limited settings. As the technology continues to mature with improvements in basecalling accuracy, classification algorithms, and user-friendly implementations, adaptive sampling is poised to become an indispensable tool in the parasite genotyping toolkit, enabling more efficient and targeted genomic investigations that were previously limited by technical and economic constraints.

16S rRNA Profiling for Respiratory Microbiome Studies in Parasitic Diseases

The study of the respiratory microbiome represents a paradigm shift in understanding pulmonary health and disease. While the lungs were historically considered sterile, advanced sequencing technologies have revealed complex microbial communities that influence host immunity and disease outcomes [33]. This application note provides a detailed protocol for 16S rRNA profiling of respiratory samples in the context of parasitic diseases, utilizing long-read nanopore sequencing to enhance strain-level resolution and enable more accurate characterization of microbial communities. The methodologies outlined herein are designed to integrate with broader parasite genotyping research frameworks, facilitating comprehensive analysis of host-microbe-parasite interactions in the respiratory tract.

The respiratory microbiome encompasses all microorganisms residing in the respiratory tract, including bacteria, archaea, fungi, and viruses [33]. In healthy states, the lung microbiome is maintained through a balance of three key ecological processes: microbial immigration, elimination, and local reproduction [34]. The composition is dynamic and influenced by factors including environmental exposures, host immunity, and anatomical factors [34] [33]. Table 1 summarizes the core bacterial genera typically found in healthy lower airways.

Table 1: Core Bacterial Genera in Healthy Lower Airways

Phylum	Common Genera	Aerobic Classification	Relative Abundance
Firmicutes	Streptococcus, Veillonella	Facultative anaerobic	High
Bacteroidetes	Prevotella, Porphyromonas	Anaerobic	High
Proteobacteria	Pseudomonas, Haemophilus	Aerobic	Moderate
Actinobacteria	Propionibacterium	Aerobic	Low

In parasitic respiratory diseases, this delicate balance is disrupted, leading to dysbiosis that may both result from and contribute to disease pathogenesis [34]. The application of long-read 16S rRNA sequencing enables researchers to move beyond genus-level identification toward strain-level characterization, which is particularly valuable for detecting low-abundance pathogens and understanding functional adaptations within the microbial community.

Methodological Framework

Conceptual Modeling and Hypothesis Framing

Prior to initiating respiratory microbiome studies, researchers should articulate clear conceptual models and hypotheses. Three core hypotheses (Figure 1) guide most investigations:

Exposure-Mediated Dysbiosis: Parasitic infections directly alter the lung microbiome, which subsequently mediates inflammatory responses and tissue injury [34].
Disease-Mediated Dysbiosis: Respiratory pathology from parasitic infections creates environmental changes that select for altered microbial communities [34].
Bidirectional Reinforcement: Once established, dysbiosis and parasitic disease perpetuate each other in a positive-feedback loop [34].

Explicit articulation of the core hypothesis and proposed ecological mechanism ensures appropriate study design, analysis selection, and interpretation clarity [34].

Sample Collection Considerations

Respiratory microbiome research employs various sample types, each with advantages and limitations:

Table 2: Respiratory Sample Types for Microbiome Studies

Sample Type	Advantages	Limitations	Recommended Applications
Bronchoalveolar Lavage (BAL)	Direct sampling of lower airways, reduced oropharyngeal contamination	Invasive procedure, requires clinical setting	Studies of alveolar compartment, immunocompromised patients
Sputum	Non-invasive, suitable for serial sampling	Potential oropharyngeal contamination, may not represent distal airways	Chronic parasitic diseases (e.g., paragonimiasis), longitudinal studies
Protected Specimen Brush	Reduced contamination, site-specific sampling	Limited biomass, invasive procedure	Targeted regional sampling, research settings
Nasopharyngeal Swab	Minimal invasion, suitable for all ages	Upper respiratory tract only, may not reflect lung microbiota	Pediatric studies, screening applications

Sample collection must account for potential confounding factors including recent antibiotic exposure, immunosuppression, demographic variables, and geographic factors [34]. Protective bronchial sampling techniques such as wax-sealed catheters are recommended to minimize oropharyngeal contamination during bronchoscopy [33].

DNA Extraction and Library Preparation

The low microbial biomass of respiratory samples presents unique challenges for DNA extraction. The following protocol is optimized for nanopore sequencing of respiratory specimens:

Step 1: DNA Extraction

Use mechanical lysis combined with enzymatic digestion for maximal DNA yield
Include negative extraction controls to monitor contamination
Employ extraction kits validated for low-biomass samples
Quantify DNA using fluorometric methods (e.g., Qubit) rather than spectrophotometry

Step 2: 16S rRNA Gene Amplification

Target the full-length 16S rRNA gene (∼1,500 bp) using primers 27F (5'-AGRGTTTGATYMTGGCTCAG-3') and 1492R (5'-RGYTACCTTGTTACGACTT-3')
Perform PCR in triplicate 25-μL reactions containing:
- 10-100 ng template DNA
- 1X LongAmp Taq Master Mix
- 0.2 μM forward and reverse primers
- Nuclease-free water to volume
Cycling conditions: 95°C for 3 min; 30 cycles of 95°C for 30 s, 55°C for 45 s, 65°C for 2 min; final extension at 65°C for 5 min
Pool triplicate reactions and purify using AMPure XP beads

Step 3: Library Preparation

Use the Native Barcoding Kit 96 (SQK-NBD114.96) for multiplexed samples
Follow manufacturer's instructions with these modifications:
- Use 100-200 ng purified PCR product per sample
- Extend repair incubation time to 30 minutes for complex samples
- Pool barcoded libraries in equimolar ratios
Quality assessment: Validate library size distribution using Fragment Analyzer or Bioanalyzer

Sequencing and Basecalling

Utilize MinION Mk1C with R10.4.1 flow cells for improved accuracy
Load 50-100 fmol of the pooled library per flow cell
Run MinKNOW software (v24.06.15 or higher) for sequencing
Perform real-time basecalling using the super-accurate (sup) model with minimum Q-score of 20 (accuracy ≥99%) [3]
Target approximately 25,000 reads per sample to ensure adequate coverage after quality filtering [3]

Bioinformatic Analysis Pipeline

The compositional nature of microbiome data requires specialized analytical approaches [35]. The following workflow processes raw sequencing data into biologically meaningful insights:

Figure 1: Bioinformatic workflow for respiratory microbiome data analysis

Key Analytical Considerations

Compositional Data Analysis: Microbiome sequencing data are inherently compositional (relative abundances sum to a constant), requiring specialized statistical approaches [35]. The recommended workflow includes:

Centered Log-Ratio (CLR) Transformation: Transforms relative abundance data from a constrained sample space to the real space, enabling use of standard multivariate methods [35].
Aitchison Distance: Euclidean distance between CLR-transformed samples serves as a compositionally appropriate beta-diversity metric [35].
Singular Value Decomposition (SVD) PCA: Preferred over principal coordinate analysis for ordination of compositional data due to increased stability and reproducibility [35].

Diversity Assessment:

Alpha-diversity: Calculate using Hill numbers within the CLR-transformed space
Beta-diversity: Assess using Aitchison distance with PERMANOVA for group comparisons

Association Analysis: Employ proportionality metrics (ρ or φ) instead of traditional correlation coefficients to avoid spurious associations in compositional data [35].

Integration with Parasite Genotyping Research

The respiratory microbiome does not exist in isolation but interacts with parasitic infections through multiple mechanisms. Long-read nanopore sequencing enables simultaneous investigation of both microbiome and parasite genetics within the same platform.

Parallel Parasite Genotyping

While 16S rRNA profiling characterizes the bacterial component, targeted amplicon sequencing of parasite genes can be performed concurrently:

Multiplexing Approach: Process respiratory samples for both 16S rRNA and parasite-specific targets (e.g., 18S rDNA for trypanosomatids) in parallel [36]
Shared Bioinformatics: Adapt similar analytical frameworks for both microbiome and parasite sequence data
Integrated Interpretation: Correlate microbial dysbiosis patterns with parasite genetic diversity

This integrated approach is particularly valuable for detecting co-infections and understanding how specific parasite genotypes influence respiratory microbial ecology [36].

The Gut-Lung Axis in Parasitic Diseases

Emerging evidence highlights bidirectional communication between gut and lung microbiomes [33]. Parasitic infections that affect either site can influence the other through immune modulation and microbial translocation. Study designs should consider collecting paired specimens from both sites when feasible to comprehensively evaluate these interactions.

The Scientist's Toolkit

Table 3: Essential Research Reagents and Materials

Item	Function	Specifications	Example Products
DNA Extraction Kit	Nucleic acid purification from respiratory samples	Optimized for low biomass, includes inhibition removal	DNeasy PowerSoil Pro Kit
Long-range PCR Kit	Full-length 16S rRNA gene amplification	High fidelity, efficient with GC-rich templates	LongAmp Taq Master Mix
Native Barcoding Kit	Multiplex sample preparation	96 unique barcodes, compatible with nanopore sequencing	SQK-NBD114.96
Flow Cells	Sequencing matrix	R10.4.1 chemistry for improved accuracy	MinION R10.4.1 Flow Cell
Bioinformatics Tools	Data analysis and visualization	Compositional data analysis capabilities	QIIME 2, R packages (propr, compositions)

Troubleshooting and Quality Control

Contamination Mitigation

Respiratory microbiome studies, particularly those involving low-biomass samples, are vulnerable to contamination:

Include Controls: Process negative extraction controls (nuclease-free water) and positive controls (mock communities) alongside samples
Monitor Reagent Contamination: Bacterial DNA is present in many laboratory reagents and can disproportionately affect low-biomass samples [34]
Batch Effects: Process case and control samples interspersed across DNA extraction batches and sequencing runs to avoid technical confounding

Data Quality Assessment

Sequence Quality: Maintain minimum average Q-score of 20 after basecalling [3]
Sample Inclusion: Establish minimum sequence depth thresholds prior to analysis (typically >10,000 reads per sample after filtering)
Contamination Identification: Identify and remove contaminants using dedicated tools (e.g., Decontam) with inclusion of control samples

16S rRNA profiling using long-read nanopore sequencing provides a powerful approach for characterizing respiratory microbiome in parasitic diseases. The full-length 16S rRNA gene sequencing enabled by this technology offers superior taxonomic resolution compared to short-read methods, while the portability and decreasing costs make it increasingly accessible for research in endemic areas. By integrating respiratory microbiome assessment with parasite genotyping within a unified analytical framework, researchers can uncover novel interactions between parasites, commensal microbes, and host immunity that may inform new diagnostic and therapeutic strategies.

Solving Common Challenges: A Troubleshooting Guide for Robust Parasite Genotyping

In the evolving field of parasite genomics, the adoption of long-read nanopore sequencing has revolutionized research by enabling real-time, field-deployable whole genome sequencing and providing unparalleled access to complex genomic regions [19] [37]. However, the success of these sophisticated applications hinges on a fundamental preliminary step: the precise quantification of input DNA. Traditional spectrophotometric methods like NanoDrop measurements often overestimate DNA concentration by detecting all nucleic acids, including single-stranded DNA, RNA, and free nucleotides, without distinguishing double-stranded DNA (dsDNA) specifically [38]. This overestimation can critically impact sequencing outcomes, as nanopore sequencing requires precise input of high-quality, high molecular weight (HMW) dsDNA for optimal library preparation [39] [40].

Fluorometric quantification has emerged as an essential technique for parasite genotyping research because it utilizes dsDNA-specific fluorescent dyes that selectively bind to dsDNA molecules, providing accurate measurements of the actual available template for sequencing reactions [38] [41]. This application note details the critical importance of fluorometric DNA quantification within the context of parasite research utilizing nanopore sequencing technologies, providing comparative data, detailed protocols, and practical recommendations to ensure sample quality and sequencing success.

Comparative Analysis of DNA Quantification Methods

Fundamental Technical Differences

The core distinction between spectrophotometric and fluorometric quantification lies in their mechanism of detection and specificity. Spectrophotometric instruments measure ultraviolet light absorption at 260 nm, where all nucleic acids (dsDNA, ssDNA, RNA) absorb light, leading to potential overestimation of available dsDNA template [38]. Additionally, they provide purity ratios (A260/A280 and A260/A230) that can indicate contamination [38].

In contrast, fluorometric methods employ dsDNA-binding fluorophores that emit light at characteristic spectra upon binding. The emitted fluorescence is proportional to dsDNA concentration in the sample, providing specific quantification of the molecules relevant to sequencing applications [38] [41]. This specificity makes fluorometry approximately 1,000 times more sensitive than absorbance-based methods, which is particularly crucial when working with mass-limited clinical samples common in parasite research [41].

Quantitative Performance Comparison

Recent systematic comparisons of DNA quantification methods reveal significant differences in performance characteristics. In a study evaluating seven different DNA samples analyzed by three independent researchers, fluorometric methods consistently provided more reliable measurements of known DNA concentrations compared to spectrophotometry [38].

Table 1: Comparative Performance of DNA Quantification Methods

Method	Principle	dsDNA Specificity	Sensitivity Range	Purity Assessment	Key Limitations
Spectrophotometry (NanoDrop)	UV absorption at 260 nm	Non-specific	~2 ng/μL to 3700 ng/μL [41]	Yes (A260/280, A260/230 ratios)	Overestimates concentration due to non-specific detection [38]
Fluorometry (Qubit dsDNA HS)	Fluorometric dye intercalation	High	0.01 ng/μL to 100 ng/μL [38]	No	Cannot detect contaminants; requires standards [38]
Fluorometry (AccuGreen HS)	Fluorometric dye intercalation	High	0.1 ng/μL to 10 ng/μL [38]	No	Limited dynamic range; requires standards [38]

The critical finding from comparative studies is that for most samples, "compared to the fluorometric kits, the used spectrophotometric instrument in the case of fish DNA samples tends to overestimate the DNA concentration" [38]. This overestimation can lead to insufficient DNA input for nanopore library preparation, resulting in suboptimal sequencing performance, particularly for challenging applications like parasite genotyping where sample quantity is often limited.

Recommended Fluorometric Quantification Protocol for Parasite DNA

This standardized protocol is optimized for quantifying parasite genomic DNA intended for nanopore sequencing applications, incorporating best practices from recent methodological comparisons.

Materials and Equipment

Fluorometer: Qubit Fluorometer (Thermo Fisher) or equivalent capable of detecting dsDNA
Quantification assay: Qubit dsDNA High Sensitivity (HS) Assay kit (catalog number Q32851) or AccuGreen High Sensitivity kit
Parasite DNA samples: Extracted using methods that preserve HMW DNA (e.g., ZymoBIOMICS DNA Miniprep Kit [40])
Laboratory equipment: Microcentrifuge, vortex mixer, timer, microcentrifuge tubes
Pipettes and tips: Accurate pipettes capable of dispensing 1-20 μL volumes

Step-by-Step Working Procedure

Preparation of Working Solution:
- Calculate the required volume of working solution (200 μL per standard or sample).
- Prepare the working solution by diluting the fluorometric dsDNA dye 1:200 in the provided buffer.
- Vortex the working solution thoroughly for 30 seconds to ensure homogeneity.
Standard Curve Preparation:
- Label two tubes for standards (#1 and #2).
- Pipette 190 μL of working solution into each standard tube.
- Add 10 μL of standard #1 (0 ng/μL) to tube #1 and 10 μL of standard #2 (10 ng/μL) to tube #2.
- Vortex each tube for 3 seconds and incubate at room temperature for 2 minutes.
Sample Preparation:
- Label Qubit tubes for each parasite DNA sample and control.
- Add 199 μL of working solution to each sample tube.
- Add 1 μL of parasite DNA sample to the appropriate tube (for the HS assay).
- Vortex tubes for 3 seconds and incubate at room temperature for 2 minutes.
Measurement Procedure:
- Turn on the fluorometer and select the appropriate assay (dsDNA HS).
- Calibrate the instrument using the two standards as directed.
- Once calibrated, measure each sample in sequence, recording the concentration values.
- If readings fall outside the linear range, dilute samples appropriately and repeat measurement.
Data Interpretation:
- Compare fluorescence readings of samples to the standard curve.
- Account for any dilution factors in final concentration calculations.
- For nanopore sequencing, ensure concentrations fall within the recommended input range (e.g., 20-100 ng/μL for many library prep protocols).

Figure 1: Fluorometric DNA Quantification Workflow for Nanopore Sequencing

Quality Control Considerations

Replication: All samples should be measured in technical duplicate or triplicate to ensure precision.
Negative Control: Include a negative control (elution buffer or nuclease-free water) to confirm absence of background fluorescence.
Sample Volume: For low-concentration parasite DNA samples, the protocol can be scaled to use larger sample volumes (up to 20 μL) with correspondingly less working solution.
Dye Compatibility: Ensure fluorometric dyes are compatible with the extraction buffers used, as some reagents may quench fluorescence.

Integration with DNA Extraction Methods for Parasite Research

The accuracy of fluorometric quantification is profoundly influenced by upstream DNA extraction methods. Recent comparative studies have demonstrated that different extraction protocols yield DNA with varying quality, quantity, and fragment length distributions, all of which impact downstream nanopore sequencing performance [39] [40].

For parasite research requiring HMW DNA, gentle lysis methods that minimize mechanical disruption (e.g., enzymatic lysis) generally produce longer DNA fragments better suited to long-read sequencing. A comprehensive evaluation of six DNA extraction methods for nanopore sequencing found that the Quick-DNA HMW MagBead Kit (Zymo Research) produced the highest yield of pure HMW DNA and enabled accurate detection of bacterial species in a complex mock community [39]. Similarly, in a study comparing extraction methods for pathogenic bacteria, the Nanobind CBB Big DNA Kit yielded the longest raw read N50 lengths (>6,000 bp for some species), which is critical for resolving complex genomic regions in parasites [40].

Table 2: Research Reagent Solutions for Parasite DNA Extraction and Quantification

Reagent/Kits	Specific Function	Application Notes
Quick-DNA HMW MagBead Kit (Zymo Research)	High molecular weight DNA extraction	Recommended for bacterial metagenomics studies using Nanopore sequencing; produces high yield of pure HMW DNA [39]
Qubit dsDNA HS Assay Kit	Fluorometric DNA quantification	Optimal for samples 0.01-100 ng/μL; provides dsDNA-specific quantification for accurate sequencing input [38]
AccuGreen High Sensitivity Kit	Fluorometric DNA quantification	Suitable for samples 0.1-10 ng/μL; compatible with various fluorometers [38]
Nanobind CBB Big DNA Kit	HMW DNA extraction	Yields longest raw read lengths; excellent for genome assembly [40]
ZymoBIOMICS DNA Miniprep Kit	DNA extraction from microbial communities	Provides higher purity DNA; suitable for mixed pathogen samples [40]

When processing clinical parasite samples, which often contain PCR inhibitors and contaminants, the combination of appropriate extraction methods followed by fluorometric quantification is particularly important. The purity of extracted DNA can be assessed using spectrophotometric measurements (A260/A280 and A260/230 ratios) alongside fluorometric quantification to gain comprehensive quality assessment [38] [40]. For nanopore sequencing of parasites, the recommended approach is "a combination of a spectrophotometric and a fluorometric method for obtaining data on the purity and the dsDNA concentration of a sample" [38].

Implications for Parasite Genotyping Research

Accurate fluorometric quantification provides critical advantages for specific applications in parasite genotyping research using nanopore sequencing:

Detection of Minority Clones and Polyclonal Infections

In malaria research, detecting minority clones in polyclonal infections is essential for distinguishing recrudescence from new infections in antimalarial drug trials. Recent studies demonstrate that nanopore amplicon sequencing can detect minority Plasmodium falciparum clones at frequencies as low as 1:100:100:100 in laboratory strain mixtures, with high sensitivity and specificity [3]. This application requires precise DNA quantification to ensure adequate representation of all parasite strains without introducing amplification biases.

Genome Assembly and Structural Variant Detection

For parasite species with complex, repetitive genomes like Trypanosoma cruzi (causing Chagas disease), nanopore sequencing significantly improves genome assembly contiguity. One study reported that incorporating MinION long reads increased the assembly size by approximately 16 Mb and improved the completeness of coding regions for both single-copy genes and repetitive transposable elements [42]. Accurate fluorometric quantification ensures optimal sequencing library preparation, maximizing read length and coverage across these challenging genomic regions.

Field-Based Sequencing Applications

The portability of nanopore sequencing platforms like MinION enables field-deployable whole genome sequencing of malaria parasites [37] [43]. In these resource-limited settings, fluorometric quantification using portable fluorometers provides a robust method for quality control prior to sequencing, helping to prevent costly sequencing failures due to insufficient or degraded input DNA.

Figure 2: Impact of Accurate DNA Quantification on Parasite Genotyping Applications

Fluorometric DNA quantification represents a critical quality control checkpoint in parasite genotyping research utilizing nanopore sequencing technologies. By providing accurate, dsDNA-specific concentration measurements, this method ensures optimal sequencing input that translates to enhanced detection of minority clones, improved genome assembly, and more reliable therapeutic efficacy assessments in antimalarial drug trials. When combined with appropriate HMW DNA extraction methods and spectrophotometric purity assessment, fluorometric quantification forms the foundation of a robust workflow for parasite genomic studies. As nanopore sequencing continues to expand into field-based applications and point-of-care diagnostics, the role of precise DNA quantification will remain indispensable for generating high-quality genomic data to support parasite research and control efforts.

Long-read nanopore sequencing has revolutionized parasite genotyping by enabling the reconstruction of complete genomes and direct detection of genetic variants. However, systematic errors in homopolymer resolution and methylation-mediated basecalling present significant challenges for accurate genomic analysis. This application note details experimental and computational strategies to overcome these limitations, featuring optimized wet-lab protocols and bioinformatics workflows specifically validated for parasite research. We demonstrate how dual-constriction nanopores, advanced polishing algorithms, and methylation-aware analysis pipelines can achieve 25-70% improvement in homopolymer accuracy and reliable methylation motif discovery, providing researchers with standardized methods for robust parasite genotyping in drug development studies.

Nanopore sequencing technology has emerged as a powerful tool for parasite genotyping research, offering long reads that span complex genomic regions and enable phased variant detection. However, the electrical signal-based detection method introduces two primary categories of systematic errors that require specialized correction approaches. Homopolymer errors occur when consecutive identical nucleotides are miscounted due to signal saturation, while methylation-mediated errors arise when epigenetic modifications distort current signals in ways unrecognized by standard basecalling models. These challenges are particularly relevant in parasite research, where AT-rich genomes common in organisms like Plasmodium falciparum contain extensive homopolymer regions, and diverse methylation systems can obscure true genetic variation.

The impact of these errors extends beyond basic sequence accuracy, affecting downstream analyses including drug resistance variant calling, strain differentiation, and virulence gene characterization. For researchers investigating antimalarial drug efficacy studies, accurate homopolymer resolution is essential for distinguishing recrudescence from new infections by ensuring reliable microhaplotype calling. Similarly, characterizing parasite methylation patterns provides insights into gene regulation mechanisms that may influence drug response. This application note provides detailed protocols to address these specific challenges through integrated experimental and computational solutions.

Homopolymer Resolution: Mechanisms and Solutions

The Homopolymer Challenge in Parasite Genomes

Homopolymers—stretches of consecutive identical nucleotides—pose a particular challenge for nanopore sequencing due to the signal saturation that occurs when multiple identical nucleotides pass through the nanopore constriction. The electrical current signatures blend together, making it difficult to determine the exact number of bases in the homopolymer stretch. This problem is exacerbated in parasite genomes like Plasmodium falciparum, which exhibit extreme AT-content (approximately 80-90%) and consequently contain abundant poly-A and poly-T tracts. Accurate resolution of these regions is critical for parasite genotyping, as errors can lead to frameshifts in coding sequences and misclassification of strains in drug efficacy studies.

Dual-Constriction Nanopores for Enhanced Resolution

A fundamental advancement in homopolymer resolution came with the development of engineered protein nanopores featuring multiple constriction points. The CsgG nanopore, when complexed with its extracellular interaction partner CsgF, forms a defined second constriction within the β-barrel that improves signal modulation during DNA translocation [44].

Structural Basis: Cryo-EM structure analysis reveals that the 33 N-terminal residues of CsgF bind inside the CsgG β-barrel, creating a sharp 15 Å-wide constriction approximately 25 Å from the primary constriction [44]. This dual-constriction architecture provides two separate measurement points for each DNA molecule, effectively doubling the signal information content for homopolymer regions.

Performance Metrics: Sequencing with the CsgG:CsgF dual-constriction pore demonstrates substantial improvements, with 25-70% enhanced single-read accuracy for homopolymers up to 9 nucleotides in length compared to conventional pores [44]. This improvement is particularly evident in AT-rich contexts relevant to parasite genotyping, where accurate resolution of long homopolymer stretches is essential for microhaplotype-based strain discrimination.

Table 1: Performance Comparison of Nanopore Technologies for Homopolymer Resolution

Pore Type	Constriction Architecture	Read Length	Homopolymer Accuracy (5-9 nt stretches)	Best Applications in Parasite Research
Standard CsgG (R9.4)	Single constriction (Y51, N55, F56)	Ultra-long (>100 kb)	70-85%	Initial assembly, SV detection
CsgG:CsgF complex	Dual constriction (25Å separation)	Long (10-100 kb)	85-95%	Microhaplotype analysis, drug resistance markers
R10.4 flow cell	Dual reader head	Long to ultra-long	90-97%	Variant phasing, methylation-aware basecalling

Experimental Protocol: Targeted Amplicon Sequencing for Parasite Genotyping

This protocol describes a optimized workflow for targeted amplicon sequencing of Plasmodium falciparum microhaplotypes using the latest nanopore chemistry, achieving high sensitivity for minority clones in polyclonal infections—a critical requirement for distinguishing recrudescence from new infections in antimalarial drug trials [3].

Sample Preparation and Multiplex PCR

DNA Extraction: Use magnetic bead-based extraction methods (e.g., QIAamp DNA Micro Kit) from dried blood spots or whole blood, with elution in 50 μL nuclease-free water. For low parasitemia samples (<100 parasites/μL), include a whole-genome amplification step with 3-5 cycles of amplification to ensure sufficient DNA template.
Primer Design: Select 6-8 highly polymorphic microhaplotype loci with demonstrated genetic diversity in target parasite populations. Optimal amplicon length is 300-600 bp with even amplification efficiency across all targets. Primers should include overhang sequences compatible with nanopore sequencing adapters.
Multiplex PCR Optimization:
- Reaction composition: 2X LongAmp Hot Start Taq Master Mix (12.5 μL), primer pool (2.5 μL of 2 μM each), template DNA (50 ng), nuclease-free water to 25 μL total volume.
- Thermal cycling conditions: Initial denaturation at 95°C for 3 min; 35 cycles of 95°C for 30s, 58°C for 45s, 65°C for 90s; final extension at 65°C for 5 min.
- Validate amplification efficiency and specificity by agarose gel electrophoresis before proceeding to library preparation.

Library Preparation and Sequencing

Library Construction: Use the Native Barcoding Kit 96 V14 (SQK-NBD114.96) following manufacturer's instructions with modifications for enhanced yield:
- Perform end-repair and dA-tailing simultaneously in a 30-minute incubation at 20°C followed by 5 minutes at 65°C.
- Use half-volume reactions for ligation steps to improve efficiency with low-input samples.
- Pool barcoded samples in equimolar ratios based on fluorometric quantification.
Sequencing: Load library onto R10.4.1 flow cells and sequence on MinION Mk1C platform with MinKNOW software (v24.06.15+). Sequence until achieving approximately 25,000 reads per marker per sample (150,000 reads total per sample) to ensure sufficient coverage for minority variant detection.

Bioinformatics Analysis

Basecalling and Demultiplexing: Perform high-accuracy simplex basecalling with Dorado (v0.8.2+) using the "sup" model with minimum Q-score of 20.
Variant Calling: Use a customized pipeline that applies rigorous cutoff criteria for haplotype calling, with minority clones reported at frequencies as low as 1% in polyclonal infections.
Quality Control: Include negative controls (nuclease-free water) and positive controls (reference strain) in each sequencing run to monitor contamination and assay performance.

Microhaplotype Sequencing Workflow for Parasite Genotyping

Computational Polishing Tools for Homopolymer Correction

Several computational approaches have been developed specifically to address homopolymer errors in nanopore data, with performance varying based on genomic context and sequencing chemistry.

Homopolish is a reference-based polishing tool that uses a support vector machine (SVM) trained on homologous sequences to distinguish systematic errors from true genetic variations [45]. When applied to microbial genomes sequenced with R9.4 flow cells, Homopolish significantly reduces indel errors in homopolymer regions, improving consensus genome quality from Q30 to Q38-Q50. For parasite genomes with close relatives in reference databases, this approach can eliminate nearly 90% of homopolymer errors remaining after initial basecalling.

Performance Validation: In a recent study evaluating Plasmodium falciparum genotyping, the combination of R10.4 flow cells with Homopolish polishing achieved 98% concordance with known microhaplotype sequences, enabling reliable detection of minority clones at 1:100 ratios in polyclonal infections—critical sensitivity for distinguishing recrudescence from new infections in antimalarial drug trials [3].

Methylation-Mediated Errors: Detection and Correction

The Methylation Challenge in Parasite Genotyping

DNA methylation represents a significant source of systematic errors in nanopore sequencing because modified bases produce distinctive current signals that may be misinterpreted by standard basecalling models. These modification-mediated errors are particularly problematic in parasite research, where diverse methylation systems function in regulation of virulence genes and defense against foreign DNA. When uncharacterized methylation patterns occur, they can generate consistent basecalling errors at specific motifs, leading to false variant calls that compromise genotyping accuracy.

The challenge is twofold: first, detecting methylation motifs that may be species-specific or strain-specific; and second, distinguishing true genetic variants from methylation-mediated basecalling errors. This is especially relevant in parasite surveillance, where accurate single-nucleotide variant calling is essential for tracking drug resistance mutations. Recent studies have documented unexpected low-quality genomes (Q26-Q32) in bacterial isolates with novel modification systems, demonstrating how uncharacterized methylation can severely impact sequencing quality despite using latest chemistry and basecallers [46].

Experimental Solutions for Methylation Challenges

Whole-Genome Amplification (WGA) Demodification For applications where epigenetic information is not required, whole-genome amplification provides an effective wet-lab approach to eliminate methylation-mediated errors by producing modification-free DNA templates [46].

Protocol Details:

Amplification Method: Use phi29 polymerase-based multiple displacement amplification (MDA) with random hexamer primers for uniform genome coverage.
Reaction Conditions: 50-100 ng input DNA, 30°C incubation for 8-16 hours, followed by enzyme inactivation at 65°C for 10 minutes.
Sequencing Considerations: WGA-demodified samples require higher sequencing depth (~100×) compared to standard nanopore sequencing (~30×) due to uneven amplification bias. Expect reduced assembly contiguity at moderate coverage.

Performance Metrics: WGA demodification has been shown to improve genome quality from Q26 to Q53 in isolates with extensive novel modifications, reducing mismatch errors from >5,000 to fewer than 20 in challenging samples [46]. The primary limitations include increased sequencing costs and potential reduction in read yields due to hyperbranched DNA structures.

Bioinformatics Tools for Methylation Discovery and Error Correction

Modpolish: Reference-Based Correction Modpolish is a computational method specifically designed to correct modification-mediated errors without prior knowledge of the modification types [46]. The tool leverages basecalling quality, basecalling consistency, and evolutionary conservation to identify and correct systematic errors while retaining true genetic variation.

Implementation Protocol:

Input Requirements: Assembled genome, raw nanopore reads (BAM format), and closely-related reference genomes (default: top 20 genomes with ≥95% identity).
Execution: Run Modpolish after initial Medaka polishing using default parameters. The algorithm identifies loci with consistently low basecall quality despite high coverage, then compares these regions to homologous sequences in reference genomes.
Performance: In isolates with novel modification systems, Modpolish improves genome quality from Q27-Q34 to Q60, reducing mismatch errors by 98.8% while correctly preserving true strain-specific variants [46].

MIJAMP: Methylation Motif Discovery For researchers requiring complete methylation characterization, MIJAMP (MIJAMP Is Just A MethylBED Parser) provides a software solution for discovering methylated motifs from nanopore sequencing data [47]. The tool employs a human-driven refinement strategy that empirically validates all motifs against genome-wide methylation data, eliminating incorrect motif calls.

Workflow Overview:

Input Preparation: Basecall native DNA sequencing data using Dorado with modified base models (5mC, 6mA, 4mC).
Motif Discovery: Process modified base calls through MIJAMP's interactive workflow to identify significantly methylated motifs.
Validation: Experimentally confirm methylation patterns by comparing with WGA-demodified control data.

Table 2: Comparison of Methods for Addressing Methylation-Mediated Errors

Method	Mechanism	Epigenetic Information Retained	Accuracy Improvement	Best Use Cases
WGA Demodification	Physical removal of modifications via amplification	No	Q26 to Q53	Surveillance studies focusing solely on genetic variants
Modpolish	Computational correction using homologous sequences	Yes	Q27-Q34 to Q60	Population genomics, strain differentiation
Basecalling with Dorado modified models	Improved signal interpretation	Yes	Raw read Q-score improvement 5-10 points	Epigenetic studies, functional genomics
R10.4.1 flow cells	Enhanced signal capture with dual reader head	Yes	5-15% reduction in mismatch errors	All applications, particularly novel species

Methylation Error Correction Decision Pathway

Integrated Workflow for Parasite Genotyping

Complete Experimental Design

For comprehensive parasite genotyping that addresses both homopolymer and methylation challenges, we recommend an integrated workflow that combines the optimal elements from previously described methods:

Sample to Result Protocol:

DNA Extraction: Use magnetic bead-based methods from dried blood spots or whole blood, with quality assessment via fluorometry.
Library Preparation: Employ ligation sequencing kit (SQK-LSK114) with native barcoding for multiplexing, prioritizing R10.4.1 flow cells for their dual reader head design that improves both homopolymer resolution and modification detection.
Sequencing: Target 50-100x coverage using PromethION or GridION platforms for population studies, or MinION for rapid surveillance.
Basecalling: Use Dorado super-accuracy model with modified base detection enabled (5mC, 6mA, 4mC) to simultaneously generate sequence and methylation data.
Variant Calling: Apply a specialized pipeline for microhaplotype analysis with Homopolish polishing for homopolymer-rich regions.
Methylation Analysis: Process modified base calls through MIJAMP for motif discovery, or use Modpolish for reference-based correction if novel methylation is causing systematic errors.

The Scientist's Toolkit: Essential Research Reagents

Table 3: Essential Research Reagents for Overcoming Systematic Errors

Reagent/Kit	Manufacturer	Function	Application Notes
Native Barcoding Kit 96 V14 (SQK-NBD114.96)	Oxford Nanopore Technologies	Sample multiplexing with native DNA	Enables pooling of 96 samples; reduces per-sample cost
R10.4.1 Flow Cells	Oxford Nanopore Technologies	Enhanced basecalling accuracy	Dual reader head improves homopolymer resolution
REPLI-g Advanced DNA Kit	Qiagen	Whole-genome amplification	Removes methylation for WGA demodification approach
Dorado Basecaller with Modified Models	Oxford Nanopore Technologies	Simultaneous basecalling and modification detection	Identifies 5mC, 6mA, 4mC without separate library prep
QIAamp DNA Micro Kit	Qiagen	DNA extraction from limited samples	Optimal for dried blood spots with low parasitemia
Custom Microhaplotype Primer Panels	Integrated DNA Technologies	Targeted amplification of polymorphic loci	Designed for specific parasite populations; 6-8 loci recommended

The integration of dual-constriction nanopores, advanced polishing algorithms, and methylation-aware analysis pipelines has substantially improved the accuracy of nanopore sequencing for parasite genotyping. By implementing the protocols detailed in this application note, researchers can achieve 25-70% improvement in homopolymer resolution and effectively overcome methylation-mediated errors that previously compromised variant detection. These advancements are particularly impactful for antimalarial drug efficacy studies, where accurate discrimination between recrudescence and new infections depends on reliable microhaplotype calling in polyclonal infections.

Looking forward, the ongoing development of nanopore technology—including the Q20+ chemistry promising raw read accuracy exceeding 99%—will further minimize systematic errors and enhance parasite genotyping applications. For the research community, adopting the standardized workflows and quality control measures described here will enable more reproducible and comparable results across studies, accelerating the development of effective interventions against parasitic diseases.

Achieving sufficient sequencing coverage is a fundamental challenge in parasite genotyping research using long-read nanopore sequencing. Low coverage can compromise the detection of single nucleotide polymorphisms (SNPs), structural variants, and key genomic features in parasite genomes, which are often complex and repetitive. The strategies employed during DNA input quantification and library preparation significantly influence data yield and quality. This application note provides detailed, evidence-based protocols to address low coverage issues, with specific considerations for parasite genomics research. By optimizing DNA input through molarity-based calculations and selecting appropriate library preparation methods, researchers can significantly improve sequencing outcomes and data reliability for downstream genotyping analyses.

Quantitative Strategies for DNA Input Optimization

Molarity-Based DNA Input Calculation

For nanopore sequencing, especially when using specialized approaches like adaptive sampling, DNA input should be calculated based on molarity rather than mass to ensure optimal pore occupancy. This is particularly critical for parasite genotyping where sample material may be limited.

Table 1: DNA Mass Calculations for Different Fragment Sizes at Constant Molarity (50 fmol)

Average Fragment Size (kb)	Calculated Mass (ng) for 50 fmol
5	165 ng
6.5	214.5 ng
10	330 ng
20	660 ng
30	990 ng

Note: Calculations based on formula: Mass = (Fragment size × 660 g/mol × 50 × 10^-15) × 1,000,000,000 [48]

The recommended molarity for current V14 chemistry is 50-65 fmol per load. With a library centered at 6.5 kb, 50 fmol corresponds to approximately 200 ng. However, as illustrated in Table 1, the required mass varies significantly with fragment size, emphasizing why molarity-based loading is essential [48].

Library Fragmentation Considerations

Library fragmentation size critically impacts sequencing efficiency and coverage uniformity in parasite genotyping studies:

Shorter fragments (5-10 kb) increase molarity for the same mass input, improve pore longevity, and reduce blocking from constant strand rejection in adaptive sampling
Longer fragments may be wasteful when targeting small genomic regions, as pores remain occupied sequencing off-target regions
Parasite DNA integrity must be preserved during extraction to maintain appropriate fragment sizes for targeting specific genomic loci [48]

Library Preparation Method Selection

Comparison of Library Preparation Strategies

The choice of library preparation method introduces specific biases that affect coverage distribution and genotyping accuracy in parasite genomes.

Table 2: Performance Comparison of ONT Library Preparation Methods

Method	Average Read Length	Total Output (12 samples)	Mappable Reads	Key Biases	Best Applications for Parasite Genotyping
Ligation (LIG)	>5,000 bp	33.62 Gbp	92.9%	Minimal coverage bias; even distribution across GC content	Whole genome sequencing; methylation analysis; structural variant detection
Tagmentation (TAG)	>5,000 bp	11.72 Gbp	87.3%	Moderate coverage bias; preference for 30-40% GC regions	Rapid genotyping; SNP calling; time-sensitive studies
PCR (PCR)	<1,100 bp	4.79 Gbp	22.7%	High sequencing noise; 22.5% artifactual tandem content	Low-quality DNA samples; severely degraded parasite DNA

Data synthesized from multiple comparative studies [49] [50]

Enzymatic Bias Characterization

Different library preparation methods exhibit distinct enzymatic biases that impact coverage in parasite genomes:

Rapid (transposase-based) kits show a recognition motif (5'-TATGA-3') consistent with MuA transposase and reduced yield in regions with 40-70% GC content [49]
Ligation-based kits demonstrate relatively even coverage distribution across various GC contents, despite showing underrepresentation of adenine-thymine (AT) sequences at sequence termini [49]
PCR-based kits introduce significant artifactual content (22.5% tandem repeats) and should be avoided for quantitative applications unless dealing with severely compromised DNA [50]

Experimental Protocols for Parasite Genotyping

Molarity-Based DNA Quantification Protocol

Objective: Accurately quantify DNA input by molarity rather than mass to optimize pore occupancy and address coverage issues in parasite genotyping studies.

Materials:

Qubit dsDNA HS Assay Kit (or equivalent fluorometric method)
Agilent Femto Pulse (for fragments >10 kb) or Agilent Bioanalyzer (for fragments <10 kb)
Biomath calculator (e.g., NEBioCalculator)

Procedure:

Determine DNA concentration using Qubit fluorometer according to manufacturer's instructions
Assess fragment size distribution using appropriate platform (Femto Pulse or Bioanalyzer)
Calculate average fragment length from size distribution data
Convert mass concentration to molar concentration:
- Use formula: Molarity (fmol/μL) = [Mass concentration (ng/μL) × 10^6] / [Average fragment length (bp) × 660 g/mol × 10^15]
- Alternatively, use online biomath calculators for this conversion
Calculate loading volume based on target molarity (50-65 fmol for V14 chemistry)
Adjust sample preparation if molarity is insufficient:
- Concentrate sample if needed
- Consider fragment size reduction if molarity is too low for available mass

Considerations for Parasite DNA:

Parasite DNA often contains high AT content; ensure accurate fluorometric quantification
For mixed samples (host-parasite), consider adaptive sampling to enrich parasite DNA [48]

Library Preparation Selection Workflow

Objective: Select optimal library preparation method based on parasite DNA quality and research goals to maximize coverage of target genomic regions.

Figure 1: Decision workflow for selecting optimal library preparation methods for parasite genotyping studies

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Reagents for Addressing Low Coverage in Parasite Genotyping

Reagent Category	Specific Products	Function in Addressing Low Coverage	Parasite Genotyping Considerations
DNA Quantification	Qubit dsDNA HS Assay Kit	Accurate mass-based quantification	Prefer over spectrophotometric methods for AT-rich parasite DNA
Fragment Analysis	Agilent Femto Pulse, Bioanalyzer	Determine fragment size distribution	Critical for molarity calculations and library preparation selection
Ligation-Based Library Prep	Ligation Sequencing Kit (SQK-LSK*)	Preserve native DNA with minimal bias	Ideal for methylation analysis in parasite epigenetics studies
Rapid Library Prep	Rapid Sequencing Kit (SQK-RBK*)	Fast workflow with transposase-based fragmentation	Efficient for time-sensitive parasite surveillance studies
Barcoding Solutions	Native Barcoding Expansion	Sample multiplexing for efficiency	Enables pooling of multiple parasite isolates in single run
Adaptive Sampling	MinKNOW software with .bed files	In silico enrichment of target regions	Crucial for host-parasite mixed samples; requires reference

Product references based on ONT kit nomenclature [48] [51]

Implementation in Parasite Genotyping Research

For parasite genotyping research, specific implementation strategies enhance coverage of target genomic regions:

Adaptive Sampling Integration: Utilize MinKNOW's adaptive sampling feature with carefully designed .bed files to enrich for parasite-specific genomic regions in host-parasite mixed samples, achieving 5-10-fold enrichment when targeting <10% of the genome [48]
Coverage Requirements: For SNP genotyping in parasite populations, 0.1-0.5× coverage may be sufficient when combined with advanced imputation tools like QUILT, while structural variant detection and de novo assembly require >20× coverage [52]
Parasite DNA Preservation: Maintain DNA integrity through minimal freeze-thaw cycles and appropriate storage conditions, as parasite DNA is often more susceptible to degradation than host DNA

The strategies outlined provide a comprehensive framework for addressing low coverage challenges specific to parasite genotyping research using nanopore sequencing. By implementing molarity-based DNA input calculations and selecting library preparation methods based on both DNA quality and research objectives, researchers can significantly improve data yield and reliability for downstream genomic analyses.

In long-read nanopore sequencing for parasite genotyping, managing contamination is a critical challenge that impacts data integrity and experimental conclusions. Contamination can arise from various sources, including laboratory reagents, environmental DNA, cross-sample contamination, and host nucleic acids. Effective decontamination requires an integrated approach combining rigorous wet-lab techniques with sophisticated computational strategies. This application note details comprehensive protocols for managing contamination specifically within the context of parasite genomics research, enabling researchers to produce more reliable and interpretable genomic data.

Wet-Lab Decontamination Techniques

Pre-Analytical Controls and Sample Preparation

The foundation of effective contamination management begins at the sample preparation stage, where strategic controls and specialized reagents are implemented.

Essential Controls:

Extraction Blanks: Include control samples that undergo identical DNA extraction procedures without the addition of biological material to identify reagent-derived contamination [53].
Negative Control Reactions: Implement negative controls (e.g., nuclease-free water) throughout library preparation and amplification steps to detect contamination introduced during processing [3].
Batch Modeling: Process samples in carefully designed batches that include controls to account for technical variability and batch-specific contamination [53].

Sample-Specific Considerations for Parasite Genotyping: Parasite genotyping often involves challenging sample types with low pathogen biomass and high host DNA background. For Plasmodium falciparum samples, which may be obtained from dried blood spots or low-parasitemia venous blood, DNA extraction methods must be optimized to maximize parasite DNA yield while minimizing co-extraction of host DNA [6]. Specific protocols have demonstrated sensitivity thresholds of 50 parasites/μL for dried blood spots and 5 parasites/μL for venous blood samples, which require meticulous contamination control to achieve [6].

Table 1: Essential Controls for Wet-Lab Contamination Management

Control Type	Implementation	Purpose	Interpretation
Extraction Blanks	Process alongside samples without biological material	Identify reagent and kit-derived contamination	Any amplification in blanks indicates contaminating DNA in reagents
Negative PCR Controls	Include in all amplification steps	Detect amplification contaminants	False positives indicate contaminated master mixes or environment
Positive Controls	Known parasite DNA samples	Verify assay sensitivity and specificity	Ensures detection limits are maintained
Batch Controls	Process across multiple sequencing runs	Identify batch-specific contamination	Controls for inter-run variability

DNA Extraction and Library Preparation Considerations

The choice of DNA extraction and library preparation methods significantly impacts contamination profiles and microbial community recovery, as demonstrated in ancient DNA studies with relevance to modern parasite genomics [54].

DNA Extraction Method Comparisons:

Silica-Based Binding Methods: Effectively recover short, fragmented DNA but may vary in contamination profiles based on binding buffer composition [54].
Guanidinium Thiocyanate Buffer: Facilitates efficient DNA release while minimizing PCR inhibitors, though different formulations may recover different fragment size distributions [54].
Short Fragment Enrichment Protocols: Modified extraction protocols specifically enhance binding efficiency of short DNA fragments (<50 bp) in a silica matrix, potentially recovering more degraded parasite DNA [54].

Library Preparation Impact:

Double-Stranded Library (DSL) Methods: May increase clonality compared to single-stranded approaches, potentially amplifying contamination artifacts [54].
Single-Stranded Library (SSL) Methods: Allow higher conversion of DNA fragments into adapter-ligated molecules but may introduce different contamination profiles [54].

For parasite genotyping workflows targeting specific markers, such as the Pfk13 gene for artemisinin resistance, multiplexed long-amplicon approaches (approximately 2.5 kb) have been successfully implemented with minimal cross-reactivity against non-falciparum Plasmodium species when optimized primer concentrations and annealing temperatures are used [6].

Laboratory Reagents and Environmental Controls

Reagent Quality Control:

Nucleic Acid-Free Reagents: Source reagents certified nucleic acid-free or treat with DNase/UV irradiation to degrade contaminating DNA.
UV Irradiation: Expose reagents, plastics, and water to UV light (254 nm) for 30 minutes to crosslink contaminating DNA before use.
Reagent Batch Testing: Document and test different reagent lots as contamination profiles can vary significantly between manufacturing batches.

Environmental Controls:

Dedicated Workspaces: Maintain physically separated pre- and post-amplification areas with dedicated equipment and supplies.
Surface Decontamination: Implement regular cleaning with DNA-degrading solutions (e.g., 10% bleach, DNA-ExitusPlus).
Personal Protective Equipment: Wear gloves, lab coats, and potentially face masks to minimize operator-derived contamination.

Computational Decontamination Techniques

Bioinformatic Contamination Identification

Computational methods provide powerful approaches to identify and remove contamination during data analysis, particularly crucial for parasite genotyping where host DNA contamination is substantial.

Taxonomic Classification-Based Filtering:

Host Sequence Removal: Map reads to host reference genomes (e.g., human, mouse) and exclude aligned reads from downstream analysis. This is particularly important for blood-derived parasite samples where host DNA can dominate sequencing libraries [53].
Expected Microbiome Profiling: Compare identified species against expected parasite profiles and filter unexpected taxa that may represent environmental contamination.
Reagent Contamination Databases: Curate databases of common contaminants (e.g., Pseudomonas, Burkholderia, Comamonadaceae) identified in extraction blanks and negative controls for systematic removal.

Statistical and Threshold-Based Approaches:

Quantitative Thresholds: Implement minimum threshold criteria such as molecules per microliter (MPM) or reads per million (RPM) to filter low-level signals that may represent contamination rather than true infection [53].
Negative Control Subtraction: Remove taxa or sequences present in negative controls at similar or higher levels than in test samples.
Cross-Sample Prevalence Analysis: Identify contaminants as sequences distributed evenly across multiple samples rather than concentrated in specific samples.

Table 2: Computational Tools for Contamination Management

Tool/Approach	Application	Advantages	Implementation Considerations
Kraken2/Bracken	Taxonomic classification	Fast classification, comprehensive database	Requires custom contaminant database
Decontam (R package)	Statistical identification of contaminants	Prevalence- and frequency-based methods	Requires multiple samples and controls
Blast-based filtering	Sequence homology identification	Highly specific	Computationally intensive
Reference-based removal	Host DNA removal	Highly effective for reducing background	May remove legitimate integrated sequences
Blank subtraction	Control-based filtering	Direct removal of identified contaminants	Requires matched experimental controls

Specialized Approaches for Parasite Genotyping

Parasite-Specific Workflows: For parasite genotyping applications, specialized bioinformatic pipelines have been developed that incorporate contamination awareness directly into the analysis. For example, in nanopore amplicon sequencing of Plasmodium falciparum microhaplotypes, custom bioinformatics workflows apply rigorous cutoff criteria for accurate haplotype calling, effectively filtering potential cross-contamination between samples [3].

Signal-Level Analysis: Emerging approaches leverage raw nanopore "squiggle" data with artificial intelligence to distinguish viable from dead microorganisms, addressing a key limitation where DNA from dead cells can persist and skew analyses much like contamination [55]. These methods use deep neural networks (e.g., Residual Neural Networks) to predict microbial viability from raw signals, potentially differentiating between contaminating DNA and biologically relevant targets [55].

Integrated Workflow for Contamination Management

A comprehensive contamination management strategy integrates both wet-lab and computational approaches throughout the entire parasite genotyping workflow, from sample collection to final data interpretation.

Complete Workflow Diagram

The diagram below illustrates the integrated contamination management workflow for parasite genotyping studies:

Quality Metrics and Validation

Establishing rigorous quality metrics is essential for validating the success of contamination management protocols in parasite genotyping studies.

Key Performance Indicators:

Endogenous DNA Content: The percentage of reads mapping to the target parasite genome, with higher percentages indicating effective host DNA removal [54].
Negative Control Performance: Verification that negative controls contain minimal to no amplification of target sequences, with established thresholds for maximum allowable background.
Reproducibility Metrics: Consistency in contamination profiles across technical replicates and sequencing batches.
Limit of Detection: Established sensitivity thresholds for parasite detection (e.g., 5-50 parasites/μL) while maintaining specificity against contaminants [6].

Validation Approaches:

Spike-In Controls: Use synthetic DNA controls or non-native parasite species at known concentrations to validate detection sensitivity and specificity.
Cross-Platform Validation: Compare results with orthogonal methods (e.g., Illumina sequencing, qPCR) to verify genotype calls.
Blinded Analysis: Implement blinded re-analysis of samples to assess reproducibility and minimize analytical bias.

Research Reagent Solutions

Table 3: Essential Research Reagents for Contamination Management

Reagent/Kit	Function	Contamination Management Features
QIAamp DNA Mini Kit (QIAGEN)	DNA extraction from blood samples	Silica-membrane technology for selective binding, compatible with inhibitor removal
UCP Multiplex PCR Kit	Multiplex amplification of parasite targets	Optimized for complex amplicon panels, reduced primer-dimer formation
ONT Native Barcoding Kit 96 V14	Library preparation for nanopore sequencing	Sample-specific barcoding to identify cross-contamination
DNase I Treatment Reagents	Degradation of contaminating DNA	Pre-treatment of reagents and samples to reduce background
QIAseq Beads	PCR cleanup and size selection	Removal of primer artifacts and nonspecific amplification products
Proteinase K	Sample digestion	Release of nucleic acids while degrading nucleases
UV Crosslinker	Reagent decontamination	Degradation of contaminating DNA in reagents and plastics

Application to Parasite Genotyping Research

The contamination management strategies outlined above have direct applications in parasite genotyping research, particularly for antimicrobial resistance monitoring and transmission dynamics.

Case Example: Plasmodium falciparum Artemisinin Resistance Monitoring In a recent study developing a multiplex long-amplicon sequencing panel for comprehensive molecular surveillance of P. falciparum resistance, researchers implemented rigorous contamination controls including [6]:

Species-specific primer design to minimize cross-reactivity with non-falciparum Plasmodium species
Analytical sensitivity validation down to 0.0001% parasitemia mock samples
Optimization of primer concentrations and annealing temperatures to eliminate nonspecific amplification
Multiplex PCR protocols with careful primer balancing to minimize amplification bias

The resulting method achieved complete coverage of resistance markers (Pfk13, Pfcoronin, Pfap2μ, Pfubp1, Pfmdr1, Pfcrt) with 100% coverage uniformity at sensitivity thresholds relevant to field-collected samples, enabling reliable distinction between true low-frequency resistance alleles and contamination artifacts [6].

Considerations for Different Sample Types:

Dried Blood Spots: Higher risk of environmental contamination during collection and storage; require specialized extraction protocols.
Low Parasitemia Samples: Higher host:parasite DNA ratio necessitates more aggressive host DNA depletion strategies.
Field-Collected Samples: Potential for geographic-specific contaminants; require local negative controls.
Long-Term Storage Samples: Risk of nucleic acid degradation and cross-contamination; require additional authentication steps.

Effective contamination management in long-read nanopore sequencing for parasite genotyping requires an integrated, end-to-end approach combining rigorous wet-lab techniques with sophisticated computational methods. By implementing the comprehensive strategies outlined in this application note—including appropriate controls, optimized laboratory protocols, and bioinformatic filtering—researchers can significantly improve the reliability and interpretability of their genotyping data. As parasite genomics continues to advance toward more sensitive detection and larger-scale surveillance, robust contamination management will remain foundational to generating clinically and epidemiologically meaningful results.

Long-read nanopore sequencing has revolutionized parasite genotyping by enabling the direct analysis of complex genomic regions, tandem repeats, and multicopy gene families that are challenging for short-read technologies. A critical first step in analyzing this data is the interpretation of read-length histograms, which provide a quantitative snapshot of the molecular population in a sequencing library. This application note details the principles and protocols for using read-length distributions to differentiate high-quality, pure parasite preparations from degraded samples or mixed-genotype infections. We provide a structured framework for troubleshooting common issues, thereby enhancing the reliability of downstream genotyping analyses in parasitology research and antimalarial drug development.

Long-read sequencing technologies, particularly those from Oxford Nanopore Technologies (ONT), have become indispensable tools in modern parasitology research. Their ability to generate reads spanning thousands of base pairs makes them ideally suited for resolving complex genomic architectures prevalent in parasitic organisms, such as tandemly repeated gene families, structurally variable antigen-encoding genes, and subtelomeric regions involved in host immune evasion [2]. For example, the genome of Plasmodium falciparum, the deadliest malaria parasite, is characterized by extensive segmental duplications and highly polymorphic gene families that are difficult to assemble and genotype with short-read technologies [2].

The process of long-read sequencing begins with the preparation of a sequencing library from high-molecular-weight (HMW) DNA. In this process, the distribution of DNA fragment lengths in the final library directly determines the distribution of read lengths generated by the sequencer. Consequently, the read-length histogram serves as a primary diagnostic tool, providing immediate visual feedback on library quality and composition before embarking on computationally intensive assembly or variant calling [56]. This is paramount for genotyping applications, where the presence of multiple plasmid species in a cloning vector or mixed-genotype infections in a clinical sample can confound analysis if not identified early.

The utility of long reads in parasitology is well-demonstrated in targeted sequencing approaches. For comprehensive surveillance of antimalarial drug resistance, a multiplex long-amplicon panel covering six genes (including Pfk13, Pfcoronin, and Pfmdr1) with amplicons standardized to 2.5 ± 0.2 kb was developed [6]. This panel's success hinges on generating full-length reads that cover entire genes in a single contig, allowing for the detection of known and emerging resistance mutations across complete coding regions. Interpreting the read-length histogram is the first and critical step in validating such experiments.

Fundamentals of Read-Length Histogram Interpretation

A read-length histogram is a graphical representation of the data produced in a sequencing run. The x-axis represents the length of sequencing reads in base pairs (bp), and the y-axis typically shows the total amount of sequencing data, often in kilobases (kb), generated from reads of each length [56]. This weighted representation means that a few very long reads can contribute a substantial amount of data to the histogram, making it particularly sensitive to the presence of "whale" reads exceeding hundreds of kilobases.

Ideal Histogram Profiles

In an ideal scenario for a haploid parasite genotype or a single plasmid preparation, the histogram should show a single, dominant peak corresponding to the expected length of the target DNA. The distribution should be tight and symmetrical, indicating a homogeneous population of molecules. For example, a clean plasmid prep might show a sharp peak at 4,800 bp [56]. Similarly, a successful long-amplicon sequencing run for a parasite gene should yield a dominant peak at the expected amplicon size (e.g., ~2.5 kb).

Table 1: Key Features of an Ideal Read-Length Histogram

Feature	Description	Interpretation
Peak Profile	Single, dominant peak	Single, predominant molecular species in the library.
Peak Shape	Tight, symmetrical distribution	Uniform fragment lengths; minimal degradation.
Peak Location	At expected genomic length or amplicon size	Successful targeting and sequencing of the intended DNA.
Background	Low baseline outside the main peak	Minimal adapter dimers, degraded DNA, or small fragments.

Artifacts and Common Anomalies

Several technical artifacts can manifest in histograms. A single plasmid species may occasionally appear as two adjacent peaks if the read length straddles a bin boundary in the histogram, a result of inherent noise in raw reads that is corrected during consensus sequence assembly [56]. More critically, a prominent peak of very short fragments (e.g., several hundred bp) often indicates substantial DNA degradation, which can originate from poor sample collection, excessive shearing, or over-tagmentation with transposases [56]. In ATAC-seq assays for chromatin accessibility, however, a multimodal distribution with a peak at 50-100 bp (nucleosome-free regions) is expected and indicates good data quality [57].

Diagnosing Sample Purity and Mixtures from Histograms

Deviations from the ideal single peak often reveal valuable information about sample purity and the presence of mixtures, which is a frequent challenge in parasite genotyping from clinical isolates.

Identifying Multiple Plasmid Species or Mixed Infections

The presence of multiple distinct peaks in a histogram strongly suggests a mixture of different DNA molecules. In plasmid sequencing, this could indicate a mixture of the target plasmid with an empty vector or other contaminating plasmids, which would appear as separate peaks at their respective sizes [56]. In the context of parasite genotyping, multiple peaks from a long-amplicon sequencing run could signal a mixed-genotype infection, a common occurrence in endemic regions. Each peak may represent a distinct allele or haplotypes of different lengths.

Detecting Concatemers

Read-length histograms can also reveal the presence of concatemers—multimeric forms of a plasmid (e.g., dimers, trimers) that arise through homologous recombination, particularly in RecA+ bacterial strains [56]. In the histogram, these appear as secondary peaks at integer multiples of the monomeric plasmid length. For instance, a sample with a monomer at 15 kb will show a dimer peak at ~30 kb [56]. These concatemers are biological phenomena, not sequencing artifacts, and their detection is a unique advantage of long-read sequencing since they are invisible to restriction digestion or Sanger sequencing.

Diagram: A decision workflow for diagnosing sample purity and mixtures from a read-length histogram.

Quantitative Metrics for Histogram Analysis

Beyond visual inspection, quantitative metrics derived from the sequencing data provide objective measures of library quality. These metrics are often calculated by tools like NanoPlot [58].

Table 2: Key Quantitative Metrics for Library QC from Sequencing Data

Metric	Definition	Target for a Clean Prep
Mean/Median Read Length	Average length of all sequencing reads.	Should align with the expected size of the target.
Read Length N50	The length at which 50% of the total sequenced bases are contained in reads of that length or longer.	As high as possible; indicates good long-read yield.
Total Throughput	Total number of bases sequenced.	Sufficient to achieve desired coverage (e.g., 30-50x for genomes) [59].
Number of Reads	Total count of sequenced reads.	Sufficient to achieve desired coverage for the target.

Protocols for Histogram Analysis and Troubleshooting

Protocol: Generating Read-Length Histograms with NanoPlot

Purpose: To generate read-length histograms and summary statistics from raw nanopore sequencing data (FASTQ files).

Materials:

Software: NanoPlot (can be installed via pip install NanoPlot or conda install -c bioconda nanoplot) [58].
Input Data: Basecalled sequencing data in FASTQ format.

Method:

Basecalling: Ensure sequencing data has been basecalled using Guppy or Dorado to produce FASTQ files.
Run NanoPlot: Execute a command in the terminal. A basic command is: NanoPlot --fastq .fastq.gz -o --plots kde hex
- --fastq: Specifies the input FASTQ file(s).
- -o: Defines the output directory for plots and reports.
- --plots kde hex: Specifies the types of bivariate plots to generate.
Advanced Filtering (Optional): Use flags to filter data before plotting:
- --minlength 500: Hide reads shorter than 500 bp.
- --drop_outliers: Remove reads with extreme lengths.
- --N50: Show the N50 mark on the read length histogram [58].
Output Analysis: Navigate to the output directory. Open the generated HTML summary file to view the read-length histogram and key statistics (NanoStats.txt).

Protocol: Diagnosing and Responding to Histogram Anomalies

Purpose: To systematically investigate and address common undesirable patterns in read-length histograms.

Materials:

The histogram and summary statistics from Protocol 4.1.
Knowledge of the expected target size (e.g., plasmid size, amplicon length).

Method:

Observation: Compare the histogram against the "ideal" profile (Section 2.1).
Diagnosis and Action:
- No dominant peak, high read count: Suggests an abundance of degraded DNA.
  - Action: Re-assess the DNA extraction protocol. For plants and parasites with complex metabolites, use SDS-based extractions with chloroform cleaning and bead-based purification to remove contaminants [59]. Always use cut or wide-bore tips to prevent shearing HMW DNA.
- Multiple peaks of different sizes: Indicates a mixture of molecular species.
  - Action: If this is unexpected for your experiment (e.g., expected clonal parasite), investigate sample contamination. If it is expected (e.g., mixed infection), use the histogram to gauge the complexity of the mixture before multi-sequence consensus assembly.
- Peaks at integer multiples of target size: Confirms concatemerization.
  - Action: This is a biological, not technical, issue. Consider using RecA- bacterial strains for plasmid propagation in the future. For analysis, the consensus sequence is typically generated from the monomer [56].
- Low total read count / no data: Indicates a failed sequencing run or insufficient library concentration.
  - Action: Verify DNA concentration with a fluorometric method (e.g., Qubit). Ensure library preparation protocols were followed precisely, paying attention to input DNA requirements [56].

The Scientist's Toolkit

Table 3: Essential Research Reagents and Tools for Long-Read Parasite Genotyping

Item	Function/Application
ONT Ligation or Rapid Sequencing Kits (e.g., SQK-ULK001)	Library preparation for generating ultra-long reads; critical for spanning complex repeats in parasite antigens [60].
Circulomics or similar HMW DNA extraction kits	High-quality DNA extraction is foundational; minimizes shearing and preserves long fragments for sequencing [60].
Multiplex PCR Kits (e.g., UCP Multiplex PCR kit)	For targeted long-amplicon sequencing panels used in resistance marker surveillance (e.g., for P. falciparum) [6].
NanoPlot Software	Primary tool for generating read-length histograms and initial quality assessment from FASTQ data [58].
Dorado Basecaller	ONT's software for converting raw electrical signal (squiggle) to nucleotide sequence; improved models (e.g., sup@v5.0) enhance accuracy [2].
Medaka	ONT's tool for polishing consensus sequences from nanopore reads, improving final assembly accuracy [2].

Benchmarking Performance: Validation and Cross-Platform Comparison for Diagnostic Reliability

Performance Benchmarking of SNV Calling

The integration of long-read nanopore sequencing into clinical and research genomics requires rigorous validation of its accuracy in detecting single nucleotide variants (SNVs). Performance is typically measured by precision (positive predictive value) and recall (sensitivity), which together provide a comprehensive view of variant calling accuracy [61].

Table 1: SNV and Small Indel Calling Performance with ONT in Clinical-grade Samples

Variant Type	Precision (PPV)	Recall (Sensitivity)	F1 Score	Sequencing Coverage	Basecaller Model
Single Nucleotide Variants (SNVs)	0.997	0.992	0.995	24-42x	Dorado HAC [61]
Small Insertions/Deletions (Indels)	0.922	0.838	0.878	24-42x	Dorado HAC [61]

Independent research using the miniSNV algorithm, optimized for Oxford Nanopore Technologies (ONT) data, corroborates these high-performance metrics, reporting superior or competitive F1-scores for SNV calling compared to other state-of-the-art approaches [62]. This demonstrates that ONT sequencing can achieve accuracy comparable to legacy methods, making it suitable for clinical applications.

The choice of basecalling model directly impacts raw read accuracy, which forms the foundation for successful variant calling. The latest ONT chemistry and basecallers have significantly improved single-read accuracy [63].

Table 2: Impact of Basecalling on Raw Read Accuracy

Basecalling Model	Reported Raw Read Accuracy	Typical Use Case
High Accuracy (HAC)	>99% (Q20) [63]	High-throughput variant analysis [63]
Super Accuracy (SUP)	99.75% (Q26) [63]	De novo assembly, low-frequency variants [63]

Detailed Experimental Protocols

Protocol 1: Whole-Genome Sequencing for SNV Detection

This protocol is designed for comprehensive SNV discovery across a parasite genome, validated for use with the ONT PromethION 2 system [61].

Required Materials:

Oxford Nanopore Technologies (ONT) PromethION 2 Solo sequencer
PromethION Flow Cells (R10.4 chemistry)
Genomic DNA Ligation Sequencing Kit V14 (SQK-LSK114)
Covaris g-TUBES
Dorado basecaller software
EPI2ME Labs wf-alignment & wf-human-variation pipelines

Step-by-Step Procedure:

DNA Shearing: Use Covaris g-TUBES to shear 1500 ng of high-quality genomic DNA to a target fragment size of 10-15 kb by centrifuging for 1 minute at 2000 RCF at room temperature [61].
Library Preparation: Construct the sequencing library according to the ONT Genomic DNA Ligation Sequencing Kit V14 protocol. This involves DNA end-repair, dA-tailing, and adapter ligation [61].
Sequencing: Load the prepared library onto a PromethION flow cell (R10.4) and sequence using the P2 solo device with MinKNOW software (v23.07.8 or later). Aim for a sequencing depth of 30-40x coverage for robust variant calling [61].
Basecalling: After the run, perform basecalling of the raw signal data using the Dorado basecaller (v0.3.3 or later). Use the High Accuracy (HAC) model (dna_r10.4.1_e8.2_400bps_hac@v4.2.0) for an optimal balance of speed and accuracy [61].
Variant Calling:
- Align the basecalled FASTQ files to a reference genome using minimap2 within the EPI2ME wf-alignment pipeline [61].
- Process the aligned BAM files using the EPI2ME wf-human-variation pipeline (v1.7.0) for variant calling. For SNVs and small indels, the pipeline employs Clair3 (v1.0.4) with default parameters [62] [61].

Protocol 2: Targeted Amplicon Sequencing for Parasite Genotyping

This protocol uses a deep amplicon sequencing approach, ideal for focused studies on specific genetic markers, such as those for drug resistance in Plasmodium falciparum [64] [65].

Required Materials:

Target-specific PCR primers
PCR reagents
ONT Ligation Sequencing Kit
MinION or GridION sequencer

Step-by-Step Procedure:

Multiplex PCR Amplification: Design primers to target specific SNP-containing genes (e.g., dhfr, dhps, pfmdr1, pfcrt, k13 for malaria). Perform the PCR reaction directly from parasite cultures or infected blood, using conditions optimized for multiplexing [64].
Library Preparation: Pool the purified PCR amplicons and prepare the sequencing library using an ONT Ligation Sequencing Kit, following the standard protocol [64].
Sequencing and Analysis: Load the library onto a MinION or GridION flow cell. Following sequencing, basecall the reads and align them to a reference sequence. Identify SNPs by analyzing the consensus at each targeted codon [64].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Reagents and Tools for ONT-based SNV Studies

Item	Function/Application	Example/Specification
Genomic DNA Ligation Sequencing Kit V14 (SQK-LSK114)	Prepares genomic DNA libraries for sequencing on ONT platforms.	ONT [61]
PromethION Flow Cell (R10.4)	The consumable containing nanopores for sequencing; R10.4 chemistry improves accuracy.	ONT [61]
Dorado Basecaller	Software that translates raw electrical signals into nucleotide sequences.	Supports HAC and SUP models [63] [61]
Clair3	A deep-learning tool for accurate variant calling (SNVs/indels) from long-read data.	Integrated in EPI2ME wf-human-variation [62] [61]
miniSNV	A lightweight, high-performance SNV calling algorithm for ONT data.	GitHub: https://github.com/CuiMiao-HIT/miniSNV [62]
GIAB Reference Samples	Gold-standard control samples (e.g., HG002) for benchmarking pipeline performance.	Genome in a Bottle Consortium [61]

The quantitative data and detailed protocols presented herein confirm that Oxford Nanopore long-read sequencing, when coupled with optimized bioinformatic pipelines like Clair3 and miniSNV, achieves high precision and recall in SNV calling from clinical samples. For parasite genotyping research, this enables reliable detection of drug-resistance markers and strain typing, providing a powerful tool for tracking the emergence and spread of antimicrobial resistance.

The selection of an appropriate sequencing platform is a critical strategic decision in parasite genomics. This application note provides a detailed, evidence-based comparison of Oxford Nanopore Technologies (ONT) and Illumina sequencing platforms, specifically for parasite genotyping research. While Illumina remains the benchmark for accuracy in variant calling and phylogenetic analysis, Nanopore sequencing offers transformative advantages in workflow speed, portability, and the ability to resolve complex genomic regions. The following data-driven analysis and protocols will guide researchers in matching platform capabilities to specific research objectives in parasite genomics.

Performance Metric Comparison

The table below summarizes the core performance characteristics of Illumina and Nanopore technologies, based on recent comparative studies.

Table 1: Head-to-Head Comparison of Sequencing Platforms for Parasite Genomics

Feature	Illumina (Short-Read)	Oxford Nanopore (Long-Read)
Read Length	Short (50-300 bp) [2]	Long (>100 kb possible) [66] [67]
Typical Raw Read Accuracy	Very High (Q25-Q30) [68] [22]	Moderate to High (Q15 with R9; >Q20 with R10.4+) [68] [22] [67]
Turnaround Time	Several hours to days	Rapid; preliminary results in hours, final reports within 24 hours [32]
Portability	Lab-bound instrumentation	High; portable devices (e.g., MinION) enable field sequencing [66] [2]
Key Strength in Parasite Genomics	High-resolution SNP calling, accurate phylogenetics for transmission dynamics [68] [22]	Resolving complex regions, structural variants, epigenetic modifications, and rapid field deployment [2]
Primary Limitation	Inability to resolve repetitive regions and complex structural variations [22] [2]	Higher per-base error rate can limit SNP-level resolution for transmission chains [68] [22] [16]
Cost & Workflow	Higher instrument cost, established library prep	Lower startup cost, simpler and faster library preparation [68]

Experimental Evidence and Application in Parasitology

Case Study: Genomic Surveillance ofClostridioides difficile

A 2025 study provides a direct performance comparison for bacterial genomic surveillance, offering insights applicable to parasite genomics [68] [22].

Accuracy: Illumina sequencing produced reads with an average quality of 99.68% (Q25), compared to 96.84% (Q15) for Nanopore, representing a tenfold difference in quality [68] [22].
Impact on Genotyping: Nanopore sequences exhibited an average of 640 base errors per genome, which led to the incorrect assignment of over 180 alleles in core genome multilocus sequence typing (cgMLST) analysis. Consequently, Nanopore-derived phylogenies were less accurate than the Illumina reference, making them inadequate for precise investigation of transmission events [68] [22].
Virtue Gene Detection: Both platforms provided comparable, satisfactory results for detecting key virulence genes (tcdA, tcdB, cdtAB) [68] [22].

Case Study: Targeted Enrichment forSchistosoma mansoniwith Adaptive Sampling

A 2025 study evaluated Nanopore's adaptive sampling for enriching parasite DNA in challenging samples, a common scenario in parasitology [16].

Challenge: Miracidia larvae of S. mansoni preserved on FTA cards often yield low quantities of DNA with high levels of host and environmental contamination.
Methodology: Researchers applied adaptive sampling, a technique that enriches for target DNA (using a reference genome) by rejecting off-target reads in real-time during sequencing.
Finding: While washing samples increased the proportion of S. mansoni DNA, adaptive sampling alone failed to generate sufficient on-target reads for effective whole-genome sequencing. The study concluded that pre-sequencing washing remains critical, and adaptive sampling in its current form is insufficient for robust enrichment in this specific application [16].

Detailed Experimental Protocols

Protocol: Rapid Metagenomic Detection of Pathogens using Nanopore Sequencing

This protocol, adapted from studies on respiratory infections, is ideal for the untargeted detection of parasites, bacteria, and viruses in clinical samples [32].

Sample Processing: Collect bronchoalveolar lavage fluid (BALF) or other relevant sample matrix. Use 1 mL for DNA extraction.
DNA Extraction & Host Depletion: Extract DNA using the QIAamp UCP Pathogen DNA Kit. Treat sample with Benzonase and Tween20 to degrade human host DNA.
Library Preparation: Utilize the ONT rapid barcoding kit (e.g., SQK-RBK114.96) for rapid, multiplexed library preparation. This step can be completed in less than 2 hours.
Sequencing: Load the library onto a MinION R10.4.1 flow cell and commence sequencing.
Real-Time Analysis: Start the sequencing run with real-time basecalling and analysis. Pathogen identification can be achieved using the EPI2ME platform or a custom bioinformatics pipeline aligning reads to a curated pathogen database.
Turnaround: Preliminary results are available within hours, with comprehensive reporting possible within 24 hours [32].

Protocol: High-Accuracy Bacterial Genotyping using Illumina

This protocol is designed for applications requiring maximum base-level accuracy, such as constructing reference genomes or outbreak investigation [22].

Culture and DNA Extraction: Culture isolates and extract high-molecular-weight genomic DNA. For bacterial isolates, a pre-lysis step with lysozyme may be used.
Library Preparation: Construct libraries using the Nextera XT DNA Library Preparation Kit, following the manufacturer's instructions.
Sequencing: Sequence on an Illumina NextSeq 500 instrument using a 2x150 bp cycle kit.
Bioinformatic Analysis:
- Quality Control: Use FastQC or Trimmomatic to assess read quality and trim adapters/low-quality bases.
- Variant Calling: Map reads to a reference genome using BWA or Bowtie2. Call SNPs and indels using tools like GATK or Snippy.
- Phylogenetic Analysis: Generate high-resolution phylogenetic trees based on SNP profiles or cgMLST schemes to investigate transmission pathways.

Workflow Visualization

The following diagram illustrates the key decision points and optimal paths for selecting a sequencing platform based on parasite genomics research goals.

The Scientist's Toolkit: Essential Research Reagents & Materials

Table 2: Key Reagents and Kits for Parasite Genomics workflows

Item	Function/Application	Example Product (Supplier)
Pathogen DNA/RNA Kit	Simultaneous extraction of total nucleic acid from complex samples.	MagPure Pathogen DNA/RNA Kit (Magen) [69]
Host Depletion Reagents	Selective removal of human host DNA to increase microbial sequencing depth.	Benzonase (Qiagen) [32] [69]
ONT Rapid Barcoding Kit	Fast, multiplexed library preparation for Nanopore sequencing.	SQK-RBK114.96 (Oxford Nanopore) [32] [22]
Illumina DNA Prep Kit	Robust library preparation for Illumina short-read sequencing.	Nextera XT DNA Library Prep Kit (Illumina) [22]
FTA Cards	Long-term stabilization of nucleic acids from field-collected samples (e.g., miracidia).	Whatman Indicating FTA Cards (Cytiva) [16]
Probe-based Enrichment Kit	Targeted enrichment for specific pathogens or resistance genes from complex samples.	Capture-based tNGS panel (various suppliers) [69]

The choice between Nanopore and Illumina sequencing is not a matter of declaring a universal winner, but of strategically aligning technology strengths with research questions. For parasite genotyping, Illumina is the superior choice when the research demands the highest possible base-level accuracy for applications like constructing reference genomes, identifying subtle SNPs for micro-epidemiology, and tracing fine-scale transmission pathways [68] [22]. In contrast, Nanopore sequencing offers a transformative advantage in speed, portability, and long-read capability, making it ideal for rapid pathogen identification in field settings, de novo genome assembly, resolving complex structural variations, and studying epigenetic modifications [66] [32] [2]. As Nanopore's accuracy continues to improve with new chemistries and basecalling algorithms [67] [2], the gap is narrowing, promising even more powerful and integrated solutions for the future of parasite genomics.

Validating a Comprehensive Diagnostic Pipeline for Diverse Variant Types

The field of parasite genotyping has been transformed by the advent of long-read sequencing technologies, which offer unprecedented capability to resolve complex genomic regions that were previously inaccessible with short-read platforms. This application note details the validation of a comprehensive diagnostic pipeline using Oxford Nanopore Technologies (ONT) long-read sequencing for detecting diverse variant types crucial for parasite research and drug development. The implementation of a unified technique that can simultaneously detect a broad spectrum of genetic variation substantially increases the efficiency of the diagnostic process, which is particularly valuable in parasitology where multiple discrete rounds of genetic testing can lead to significant delays and financial burden [70]. For researchers studying parasite genomics, this validated pipeline provides a robust foundation for investigating host-parasite interactions, tracking drug resistance markers, and understanding population dynamics with a level of resolution that was previously unattainable.

Experimental Design and Validation Strategy

Benchmarking Samples and Truth Sets

A critical component of pipeline validation involves the use of well-characterized reference materials to establish accuracy metrics. The validation approach incorporates two complementary strategies:

Reference Cell Lines: Genome in a Bottle (GIAB) reference samples (e.g., HG002-HG007) with available truth sets acquired from the Coriell Institute provide a gold standard for variant calling performance assessment. These samples have extensively characterized variants that serve as ground truth for calculating precision and recall metrics [71].
Clinical and Parasite-Specific Samples: For parasitology applications, the use of well-defined laboratory strain mixtures and previously characterized clinical samples is essential. Studies have successfully used defined ratios of multiple Plasmodium falciparum strains (e.g., 3D7, K1, HB3, FCB1) in mixtures ranging from 1:1:1:1 to 1:100:100:100 to validate detection sensitivity for minority clones in polyclonal infections [3].

Performance Metrics and Statistical Analysis

Rigorous benchmarking requires standardized metrics to evaluate variant calling accuracy. The Global Alliance for Genomics and Health (GA4GH) benchmarking tools provide a framework for this analysis, classifying each variant as a true positive (TP), false positive (FP), or false negative (FN) [71]. The following key metrics are calculated:

Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 score = 2 × (Precision × Recall) / (Precision + Recall)

These metrics are calculated separately for different variant types, including single nucleotide variants (SNVs), small insertions/deletions (indels), structural variants (SVs), and repeat expansions to provide a comprehensive assessment of pipeline performance [71].

Results: Comprehensive Performance Validation

Table 1: Performance metrics of the long-read sequencing pipeline for diverse variant types

Variant Type	Precision	Recall	F1 Score	Validation Context
SNVs	99.7%	99.2%	99.4%	GIAB samples, 30-40x coverage [71]
Small Indels	92.2%	83.8%	87.8%	GIAB samples, 30-40x coverage [71]
Structural Variants	>99.9%	>99.9%	>99.9%	Clinical samples, complex SVs [70]
Repeat Expansions	99.4%	99.4%	99.4%	72 clinical samples [70]
Minority Clones	Detected at 1:100 ratio	-	-	Plasmodium strain mixtures [3]

Detailed Performance by Variant Category

Single Nucleotide Variants and Small Indels The pipeline demonstrates excellent performance for SNV detection, with precision and recall exceeding 99% and 99.2%, respectively, when using high-accuracy basecalling models at 30-40x coverage [71]. Small indel detection, historically challenging for long-read technologies, now approaches precision of 92.2% and recall of 83.8% with modern bioinformatics tools [71]. Deep learning-based variant callers such as Clair3 have been shown to provide the most accurate results for both SNPs and indels, with F1 scores exceeding 99.5% for SNPs and 99.2% for indels when using super-accuracy basecalling models [72].

Structural Variants and Complex Genomic Alterations Long-read sequencing excels in detecting structural variants, with demonstrated precision and recall exceeding 99.9% for complex SVs [70]. This capability is particularly valuable in parasitology for identifying large-scale amplifications, deletions, and rearrangements associated with drug resistance or virulence. The technology enables direct phasing of compound heterozygous variants from singleton patient data, confirming autosomal recessive inheritance patterns that are relevant for understanding parasite susceptibility genes [73].

Sensitivity for Minority Clones in Polyclonal Infections In parasite genomics, the ability to detect minority clones in polyclonal infections is crucial for understanding resistance emergence and strain dynamics. The optimized pipeline demonstrates exceptional sensitivity, reliably detecting minority clones at ratios as low as 1:100 in complex strain mixtures, with false-positive haplotypes occurring at rates below 0.01% [3]. This performance enables accurate distinction between recrudescence and new infections in antimalarial drug trials, with consistent classification in 85% of paired patient samples across multiple genetic markers [3].

Detailed Experimental Protocols

Sample Preparation and DNA Extraction

Protocol 1: High-Molecular-Weight DNA Extraction from Blood Samples

For optimal long-read sequencing results, DNA integrity is paramount. The following protocol is adapted from validated clinical and parasitology studies:

Sample Collection: Collect blood samples in SET buffer or as dried blood spots (DBS) depending on field conditions [3] [11].
DNA Extraction: Purify DNA using either:
- ReliaPrep Large Volume HT gDNA Isolation kit (Promega) for larger blood volumes (>1 ml) [73]
- Chemagic DNA Blood kit (Revvity) for smaller blood volumes (<1 ml) [73]
- DNeasy Blood & Tissue Kit (Qiagen) for previously extracted samples [70]
DNA Quantification and Quality Control: Quantify DNA using fluorometric methods (e.g., Qubit dsDNA BR Assay) and assess fragment size distribution using agarose gel electrophoresis or TapeStation analysis. Ideally, samples should have approximately 80% of sheared fragments between 8 kb and 48.5 kb in length [70].

Protocol 2: Selective Whole Genome Amplification for Parasite-Enriched DNA

In parasite genomics where host DNA dominates, Selective Whole Genome Amplification (SWGA) significantly improves parasite sequencing yield:

Primer Design: Design primers using swga2.0 software with parasite reference genomes and host genome as inputs [74].
Amplification Reaction:
- Dilute DNA samples to 25 ng/μL
- Mix 2 μL diluted DNA with 2.5 μL primer set mix and 0.5 μL 10× EquiPhi29 Reaction Buffer
- Denature at 95°C for 3 minutes, then immediately place on ice
- Add 15 μL amplification master mix (1.5 μL 10× EquiPhi29 Reaction Buffer, 0.2 μL DTT, 2 μL dNTP mix, 1 μL EquiPhi29 DNA polymerase, 1 μL pyrophosphatase, 9.3 μL water)
- Incubate at 45°C for 3 hours followed by 65°C for 10 minutes [74]

Library Preparation and Sequencing

Protocol 3: Nanopore Library Preparation for Whole Genome Sequencing

This protocol is adapted from the ONT genomic DNA ligation sequencing kit (SQK-LSK114) used in multiple validation studies:

DNA Shearing: Shear 1.5 μg genomic DNA to 10-15 kb fragments using Covaris g-TUBES centrifuged at 2000 RCF for 1 minute at room temperature [71].
Library Preparation:
- Perform DNA repair and end-prep using NEBNext Ultra II End Repair/dA-tailing Module
- Adapter ligation using ONT Native Barcodes for multiplexing
- Purify using AMPure XP beads
- Prime the library using ONT Sequencing Primer and load onto flow cells [71] [3]
Sequencing: Load prepared libraries onto PromethION flow cells (R10.4.1) and sequence using Kit 14 chemistry with MinKNOW software. Target 30-40x coverage for human genomes or 150,000 reads per sample for targeted parasite sequencing [73] [3].

Protocol 4: Targeted Amplicon Sequencing for Parasite Genotyping

For specific parasite genotyping applications, targeted amplicon sequencing provides a cost-effective alternative:

Multiplex PCR: Amplify target loci using previously validated primer sets for polymorphic microhaplotype loci (e.g., ama1, celtos, cpmp, cpp, csp, and surfin1.1 for Plasmodium falciparum) [3].
Library Preparation: Use ONT Native Barcoding Kit 96 V14 (SQK-NBD114.96) following manufacturer's instructions with modifications for amplicon sequencing [3].
Sequencing: Perform sequencing on MinION Mk1C platform with R10.4.1 flow cells, targeting approximately 25,000 reads per marker per sample [3].

Bioinformatic Analysis Pipeline

Protocol 5: Comprehensive Variant Calling Workflow

The integrated bioinformatics pipeline utilizes a combination of publicly available variant callers optimized for different variant types:

Basecalling and Alignment:
- Perform basecalling using Dorado with super-accuracy model (dnar10.4.1e8.2400bpssup)
- Align FASTQ files to reference genome using minimap2 (v2.26) [71]
Variant Calling:
- SNVs and small indels: Clair3 (v1.0.4) [71]
- Structural variants: Sniffles2 (v2.2) [71]
- Copy number variants: QDNAseq (v1.38) [71]
- Repeat expansions: Straglr [71]
Variant Prioritization and Annotation:
- Annotate variants using Exomiser including REVEL, CADD, AlphaMissense, and SpliceAI in silico prediction tools [73]
- For parasitology applications, use custom scripts to identify haplotype frequencies in polyclonal infections [3]

Diagram 1: Comprehensive workflow for validating a diagnostic pipeline for diverse variant types using long-read nanopore sequencing

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential research reagents and materials for implementing the comprehensive diagnostic pipeline

Category	Specific Product	Application & Function	Key Features
DNA Extraction	ReliaPrep Large Volume HT gDNA Isolation kit (Promega)	High-molecular-weight DNA extraction from large blood volumes	Maintains DNA integrity for long-read sequencing [73]
	Chemagic DNA Blood kit (Revvity)	DNA extraction from small blood volumes (<1 ml)	Optimized for low-input samples [73]
Library Preparation	Ligation Sequencing Kit V14 (SQK-LSK114, ONT)	Whole genome sequencing library preparation	Compatible with multiplexing using native barcodes [71]
	Native Barcoding Kit 96 V14 (SQK-NBD114.96, ONT)	Multiplexed library preparation for up to 96 samples	Enables cost-effective sequencing of multiple samples [3]
Amplification	EquiPhi29 kit (Thermo Fisher Scientific)	Selective whole genome amplification	Isothermal amplification enriching parasite DNA in host background [74]
Sequencing	PromethION Flow Cells R10.4.1 (ONT)	High-throughput long-read sequencing	Updated pore design for improved accuracy [73]
	MinION R10.4.1 Flow Cells (ONT)	Portable, smaller-scale sequencing	Ideal for field applications and rapid genotyping [3]

The validated comprehensive diagnostic pipeline for diverse variant types using Oxford Nanopore long-read sequencing represents a significant advancement for parasite genotyping research. The platform's ability to detect the full spectrum of genomic variation—from SNVs and indels to complex structural variants and repeat expansions—in a single assay addresses critical limitations of previous technologies. For researchers in parasitology and drug development, this pipeline enables more accurate tracking of resistance markers, improved understanding of parasite population dynamics, and enhanced ability to distinguish recrudescence from new infections in clinical trials.

The decreasing costs of long-read sequencing technologies and continuous improvements in basecalling accuracy and bioinformatic tools suggest that this comprehensive approach will become increasingly accessible to researchers worldwide [70]. Future developments will likely focus on enhancing portable sequencing capabilities for field applications, refining targeted enrichment approaches for specific parasite genomes, and integrating multi-omics data for a more comprehensive understanding of host-parasite interactions. For the research community, adopting this validated pipeline offers the opportunity to accelerate discoveries in parasite genomics and contribute to more effective strategies for controlling parasitic diseases.

Adaptive sampling on Oxford Nanopore Technologies (ONT) platforms represents a significant advancement for targeted genomic investigations, enabling real-time, computational enrichment of specific DNA or RNA sequences without the need for physical sample manipulation. This application note assesses the efficacy of this method, summarizing quantitative performance data across studies and detailing standardized protocols for its implementation. While demonstrating robust enrichment capabilities of 5 to 10-fold for genomic DNA, the method shows more modest enrichment (1.3 to 1.9-fold) for transcriptomic applications. When applied to parasite genotyping, this technology offers a powerful tool for characterizing complex antigen genes and conducting large-scale genomic surveillance.

Adaptive sampling is a targeted sequencing strategy available on ONT sequencing instruments that performs real-time selection of DNA or RNA molecules during a sequencing run. The core principle involves basecalling reads as they enter the pore and aligning these short "chunks" of sequence to a user-provided reference file. Based on this real-time alignment, the software decides whether to continue sequencing a molecule or to reverse the voltage and eject it, thereby freeing the pore to capture another strand. This process allows for two primary operational modes: enrichment mode, where only strands mapping to specified regions of interest (ROIs) are sequenced, and depletion mode, where strands mapping to undesired regions are ejected [48]. This method is particularly valuable for cost-effective and rapid analysis of specific genomic loci, such as polymorphic antigen genes in parasites, where it can efficiently focus sequencing resources on the most informative regions.

Quantitative Assessment of Enrichment Success

The performance of adaptive sampling varies significantly depending on the application, with genomic DNA studies generally reporting higher enrichment factors than transcriptomic studies. The table below summarizes key performance metrics from recent evaluations.

Table 1: Enrichment Performance of Adaptive Sampling Across Applications

Application / Study Focus	Reported Enrichment Factor	Key Performance Metrics	Noted Limitations
Targeted Genomic Sequencing (Hereditary Cancer Genes) [75] [76]	Median: 10.4× (Range: 5.5 – 14.5×)	On-target depth: ~22×; SNV recall: 98.8%; Effective SV and mobile element insertion detection.	Enrichment decreases when targeting >10% of the genome [48]. Lower coverage compromises SNV recall [75].
cDNA Sequencing [77]	1.3× (in "base proportion")	Performance depends on reference file structure; Gene-based and "master-transcript-based" references performed best.	Short read length and sequencing quality limit performance; Significantly less effective than cDNA hybridization capture [77].
Direct RNA Sequencing [77]	1.9× (in "base proportion")	Can boost target yield within fixed run times.	Modest enrichment due to molecule length; Depletion mode is more efficient than enrichment mode [78].

Detailed Experimental Protocol for Genomic DNA Adaptive Sampling

This protocol is designed for targeted sequencing of genomic DNA, such as for parasite genotyping, using the MinKNOW software on an ONT sequencer.

Pre-sequencing Preparation

Library Preparation: Follow standard ONT library prep protocols for genomic DNA. While adaptive sampling requires no special reagents, library fragmentation is a critical consideration.
Determining DNA Input: Calculate the required DNA mass based on molarity, not mass. For optimal pore occupancy with V14 chemistry, aim for 50-65 fmol per load.
- Example Calculation: For a library with an N50 of 6.5 kb, 50 fmol is approximately 215 ng. Use a biomath calculator for precise conversions based on your library's fragment size distribution [48].
Reference File Preparation: Create a BED file specifying the coordinates of your ROIs. For optimal enrichment, it is recommended to add a "buffer" region (e.g., 5-10 kbp) to each side of the exact ROI to account for the decision-making latency of the software [48].

MinKNOW Run Setup and Execution

Load the Library: Load the prepared library onto the flow cell as usual.
Configure Adaptive Sampling: In the "Run Options" section of MinKNOW, select "Adaptive Sampling" and upload:
- The reference genome FASTA file.
- The BED file containing your buffered ROIs.
- Select "enrichment" mode [48].
Configure Live Analysis (Optional but Recommended): In the "Analysis" section, upload the same reference FASTA and an unbuffered BED file containing only the precise ROIs. This allows MinKNOW to track and display the actual on-target coverage in real-time [48].
Start the Sequencer: Initiate the sequencing run. MinKNOW will now perform real-time basecalling and alignment, ejecting off-target molecules.

Post-sequencing Data Analysis

Basecalling and Alignment: If live basecalling was not performed, use the standalone Dorado basecaller for post-run basecalling. This can be run on a high-performance computer or cloud instance [5].
Variant Calling and Analysis: Process the resulting BAM files with a pipeline suitable for your application. For parasite genotyping, this would include a specialized variant caller to identify and phase polymorphisms in target antigen genes.

Figure 1: Workflow for Genomic DNA Adaptive Sampling. The process involves careful pre-sequencing preparation, real-time selection during the sequencing run, and subsequent bioinformatic analysis.

Successful implementation of adaptive sampling requires both specific reagents and computational resources.

Table 2: Essential Materials and Tools for Adaptive Sampling Experiments

Item	Function / Description	Example / Specification
ONT Sequencer & Flow Cell	Platform for generating long reads and executing adaptive sampling.	GridION, PromethION, or MinION Mk1C [75] [48].
Library Prep Kit	Prepares DNA or RNA for sequencing; no special kit is required for AS.	Ligation Sequencing Kits (e.g., SQK-LSK114) for DNA; Direct RNA Kit (SQK-RNA003) for RNA [48] [78].
High-Quality DNA/RNA	Starting material. Fragmentation of DNA is often crucial for performance.	Covaris g-TUBEs or Megaruptor kits for DNA shearing [48].
Reference Files	FASTA and BED files used by MinKNOW for real-time read selection.	BED file with ROIs; reference genome in FASTA format [48].
Computational Resources	For real-time basecalling and post-run analysis.	NVIDIA GPU (≥8 GB memory) for Dorado basecaller; high-performance compute cluster [5].
Basecaller (Dorado)	Production basecaller for converting raw signal to nucleotide sequence.	Available for free download; optimized for NVIDIA GPUs [5].

Limitations and Strategic Considerations

Despite its utility, adaptive sampling has inherent limitations that must be factored into experimental design.

Throughput Trade-off: The process of constantly ejecting off-target reads reduces overall pore occupancy, leading to a lower total sequencing output compared to a whole-genome run [48].
Fragment Length Considerations: The efficiency of enrichment is tied to library fragment size. Sequencing very long fragments to capture small targets is inefficient, as the pore remains occupied for a long time to sequence mostly off-target regions [48].
Transcriptomic Challenges: For RNA and cDNA sequencing, the shorter length of molecules means a larger proportion of the read is sequenced before a rejection decision can be made, resulting in significantly lower enrichment factors (typically <2×) [77] [78].
Computational Demand: Real-time basecalling and alignment require substantial computational power, typically a high-performance GPU, to keep up with data generation [5].

Figure 2: Key Factors Influencing Adaptive Sampling Efficacy. The success of an adaptive sampling experiment is determined by an interplay of wet-lab and computational parameters.

Adaptive sampling establishes a flexible and powerful paradigm for targeted long-read sequencing, proving highly effective for enriching genomic regions with demonstrated median enrichment of 5-10x. Its application within parasite genotyping research is particularly promising for overcoming challenges in haplotype phasing and structural variant detection in complex antigen families. By following the detailed protocols outlined herein and carefully considering its inherent limitations—particularly regarding throughput trade-offs and the modest efficacy in transcriptomic applications—researchers can strategically deploy this technology to accelerate genomic surveillance and vaccine antigen discovery.

Evaluating Taxonomic Profiling and Diversity Metrics in Microbiome Studies

The accurate characterization of microbial communities is fundamental to advancing research in human health, environmental science, and infectious diseases. For parasite genotyping and microbiome studies, long-read nanopore sequencing has emerged as a transformative technology that provides enhanced resolution for differentiating closely related species and strains. This capability is particularly valuable for studying complex microbial communities and genetically diverse pathogens like Plasmodium species, where accurate taxonomic profiling is essential for understanding transmission dynamics, drug resistance, and vaccine development.

This application note provides a comprehensive framework for evaluating taxonomic profiling tools and diversity metrics within the specific context of long-read sequencing data. We present standardized protocols, comparative performance metrics, and practical guidance to help researchers implement robust, reproducible microbiome analysis pipelines tailored for parasite genotyping research.

Table 1: Key Alpha Diversity Metrics for Microbiome Analysis

Metric Category	Specific Metrics	Key Features	Biological Interpretation	Considerations for Parasite Studies
Richness	Chao1, ACE, Observed ASVs	Estimates number of taxa; Chao1 uses singletons/doubletons	Species richness in a sample	Sensitive to sequencing depth; useful for detecting rare parasites
Phylogenetic Diversity	Faith's PD	Sum of branch lengths in phylogenetic tree	Evolutionary diversity captured	Valuable when studying related parasite strains
Information Theory	Shannon, Brillouin	Combines richness and evenness	Overall diversity accounting for abundance distribution	Higher values indicate more diverse parasitic communities
Dominance/Evenness	Simpson, Berger-Parker, Gini	Measures abundance distribution inequality	Dominance of most abundant taxa	Identifies dominant parasite species in mixed infections

Table 2: Performance Comparison of Long-Read Taxonomic Profilers

Tool	Classification Approach	Database Size	Reported Recall	Reported Precision	Computational Requirements
Lemur	Marker-based (EM algorithm)	4.1 GB	0.951-1.000	0.596-0.703	~32 GB RAM; runs on laptop
Melon	Marker-based (two-stage)	8.9 billion bp (compressed)	0.963	0.929	Standard laptop feasible
Kraken 2	k-mer based	Varies (typically large)	0.976-1.000	0.055-0.589	High RAM requirements
MetaMaps	Read mapping (succinct index)	Varies	0.960-1.000	0.009-0.909	Moderate to high
Sourmash	k-mer based	Varies	0.800-0.927	0.727-0.938	Moderate

Protocols for Taxonomic Profiling with Long-Read Data

Protocol 1: Sample Preparation and Library Construction for Parasite Genotyping

Principle: High-quality DNA extraction and proper library preparation are critical for successful long-read metagenomic studies, particularly for complex parasite samples.

Materials:

DNeasy PowerSoil Pro Kit (Qiagen) or similar
Oxford Nanopore Ligation Sequencing Kit (SQK-LSK114)
PCR Barcoding Expansion 96 (EXP-PBC096)
Agilent Bravo Automated Liquid Handling Platform (optional)

Procedure:

DNA Extraction: Extract genomic DNA from parasite cultures or infected samples using the DNeasy PowerSoil Pro Kit following manufacturer's instructions.
Quality Assessment: Verify DNA purity and quantity using NanoDrop, Qubit, and TapeStation measurements. Ensure high molecular weight DNA is preserved.
Input Normalization: Normalize DNA input to 1μg for library preparation.
Library Preparation: Perform parallel manual and automated library preparations using the Ligation Sequencing Kit and PCR Barcoding Expansion 96.
- Manual Protocol: Follow manufacturer's protocol with incubation at 37°C during bead purification to preserve long fragments.
- Automated Protocol: Use Bravo platform with equivalent reagent volumes, noting temperature control limitations.
Pooling and Sequencing: Normalize barcoded libraries, pool equivalently, and sequence on R10.4.1 PromethION flowcell.

Notes: Automated library preparation may yield slightly shorter read lengths due to limitations in temperature control during bead purification, but provides higher throughput and reproducibility [79].

Protocol 2: Taxonomic Profiling Using Marker-Based Approaches

Principle: Marker-based taxonomic profilers leverage conserved, single-copy genes to provide accurate taxonomic abundance profiles, representing the fraction of cells rather than sequencing reads.

Materials:

Melon software (available at https://github.com/treangenlab/melon)
Lemur software (available at https://github.com/treangenlab/lemur)
Marker gene database (e.g., Melon's curated RPG database)

Procedure:

Data Preprocessing:
- Basecall raw signals using Guppy (v7.1.4+)
- Remove adapters using Dorado (v0.6.0+)
- Perform quality filtering (Q-score ≥7)

Marker-Based Classification with Melon:
- Extract reads covering marker genes using protein database (468,432 unique sequences)
- Map marker-containing reads to nucleotide database (compressed from 310,881 RefSeq assemblies)
- Run two-stage classification:
Validation with Magnet (Optional):
- Confirm presence/absence of specific microbial genomes
- Reduce false positives in complex samples
Output Interpretation:
- Analyze taxonomic abundance profiles (genome copies/total detected)
- Compare with sequence abundance from DNA-to-DNA methods

Notes: Melon specifically uses ribosomal protein genes (RPGs) as markers due to their low mutation rates and essential role in protein synthesis, making them ideal for prokaryotic classification [80].

Experimental Workflow for Diversity Analysis

The following diagram illustrates the complete workflow for taxonomic profiling and diversity analysis in parasite microbiome studies:

Figure 1: Workflow for Taxonomic Profiling and Diversity Analysis in Microbiome Studies

Table 3: Key Research Reagent Solutions for Parasite Microbiome Studies

Category	Specific Product/Resource	Function/Application
Sequencing Kits	Oxford Nanopore Ligation Sequencing Kit (SQK-LSK114)	Library preparation for long-read metagenomic sequencing
DNA Extraction	DNeasy PowerSoil Pro Kit (Qiagen)	High-quality DNA extraction from complex samples
Automation	Bravo Automated Liquid Handling Platform	High-throughput, reproducible library preparation
Taxonomic Profiling	Melon Taxonomic Profiler	Marker-based classification for long-read data
	Lemur & Magnet Tool Suite	Lightweight profiling and validation
Reference Databases	Curated RPG Database (Melon)	468,432 unique sequences for marker-based classification
	NCBI RefSeq/GTDB	Comprehensive genomic references for nucleotide mapping
Validation Standards	ZymoBIOMICS Microbial Standards	Mock communities for pipeline validation

Applications in Parasite Genotyping Research

Long-read sequencing approaches have demonstrated particular utility in parasite research, where genetic diversity and complex antigen variation present challenges for short-read technologies. A specialized genomic surveillance platform has been developed for genotyping Plasmodium antigens rich in structural polymorphisms using long-read circular consensus sequencing. This platform enables processing of up to 384 multiclonal isolates in a single run, providing critical epidemiological insights into community spread of infection [81].

For rodent-infectious Plasmodium species like P. yoelii - important model organisms for studying mosquito and liver stages of development - high-quality genome assemblies using PacBio sequencing have revealed biologically meaningful differences between strains that were previously obscured in fragmented assemblies [82]. These advances in genomic characterization provide the foundation for more accurate taxonomic profiling in experimental malaria studies.

Analysis and Interpretation of Diversity Metrics

When interpreting alpha diversity metrics in parasite microbiome studies, researchers should consider several critical factors:

Metric Selection: Richness estimators (Chao1, ACE) are particularly sensitive to rare taxa, making them valuable for detecting low-abundance parasites in mixed infections. Phylogenetic diversity (Faith's PD) provides evolutionary context when studying related parasite strains [83] [84].
Technical Considerations: Note that some denoising algorithms (e.g., DADA2) remove singletons as part of their process, which impacts metrics like Chao1 that rely on these rare variants [83].
Study Design Implications: Beta diversity metrics (Bray-Curtis, UniFrac) are generally more sensitive for detecting differences between sample groups than alpha diversity metrics, potentially requiring smaller sample sizes to achieve statistical power [84].
Standardization Needs: Consistent application of diversity metrics across studies is essential for comparative analysis. A recent guidelines paper recommends including metrics representing richness, phylogenetic diversity, entropy, dominance, and estimates of unobserved microbes as a comprehensive set [83].

The integration of long-read sequencing with appropriate taxonomic profiling tools and diversity metrics provides a powerful framework for advancing parasite genotyping research. Marker-based approaches like Melon and Lemur offer specific advantages for long-read data, including reduced computational requirements and more biologically meaningful abundance estimates. When combined with standardized protocols for library preparation and data analysis, these methods enable researchers to overcome traditional challenges in microbiome study design and interpretation.

As long-read technologies continue to evolve in accuracy and throughput, their application in taxonomic profiling and diversity assessment will play an increasingly important role in understanding complex parasite communities, ultimately supporting the development of improved diagnostics, therapeutics, and vaccines for parasitic diseases.

Conclusion

Long-read Nanopore sequencing has matured into a powerful, versatile platform for parasite genotyping, capable of delivering high-fidelity data comparable to short-read technologies while providing unparalleled insights into complex genomic regions and structural variations. Its portability and real-time analysis potential are revolutionizing field-based genomic surveillance. However, successful implementation requires careful attention to sample quality, an understanding of platform-specific error modes, and strategic selection of wet-lab and computational methods—choosing between whole-genome sequencing, adaptive sampling, or targeted AmpSeq based on the specific research question. Future directions will focus on refining enrichment strategies for low-input samples, developing integrated bioinformatics pipelines for clinical diagnosis, and leveraging the technology's potential for rapid response to emerging drug resistance and outbreaks, ultimately strengthening global infectious disease control efforts.