What Is Long-Read Low-Pass Sequencing?

Long-read low-pass sequencing, or LRLP, is a whole-genome sequencing approach that combines two ideas: long, accurate reads from technologies like PacBio HiFi, and low sequencing coverage per sample.

Together, these two ideas solve a problem the field has been working around for years. Researchers wanted the accuracy and resolution of long-read sequencing, but the cost of running every sample at high coverage made population-scale studies impractical. LRLP closes that gap. It delivers long-read data quality at coverage levels low enough to multiplex dozens of samples per sequencing cell.

This post explains how LRLP works, what it detects, how it compares to other methods, and where it fits in modern genomics.

The Two Concepts Behind LRLP

Long-Read Sequencing

Traditional short-read sequencing technologies generate reads of about 150 base pairs. Long-read sequencing generates reads thousands to tens of thousands of base pairs long. PacBio HiFi reads, for example, are typically 10,000 to 20,000 base pairs and carry per-base accuracy above 99.9%.

That length matters. Short reads cannot span repetitive regions, complex structural variants, or large insertions and deletions. They get lost when the genome they are trying to align to contains sequences that look identical in multiple places. Long reads span those regions in a single contiguous read, which means alignment is unambiguous and variant calling is reliable in places short reads cannot reach.

This includes centromeres, telomeres, GC-rich promoters, repetitive transposable elements, and the highly rearranged regions of sex chromosomes. It also includes the homeologous chromosomes of polyploid species, where short reads frequently misalign and produce noisy genotype calls.

Low-Pass Sequencing

Sequencing coverage refers to how many times each base in the genome is sequenced on average. Deep coverage means reading each base 30 times or more. Low-pass coverage means reading it far fewer times, sometimes under five times per haplotype.

Lower coverage means lower cost per sample. It also means many samples can be pooled together in a single sequencing run, reducing the cost of large studies.

The historical concern with low-pass sequencing is data quality. With short reads, low coverage in difficult genomic regions produces unreliable genotype calls, which is why researchers have traditionally needed deep coverage to call variants confidently.

Long reads change that calculation. Because each read carries more information and aligns more reliably, low coverage with long reads delivers higher-quality data than the same coverage with short reads. This is the insight that makes LRLP work.

How LRLP Combines Both

The combination is what makes LRLP distinct from either method on its own. You get the resolution and accuracy of long-read sequencing, applied at coverage levels and per-sample costs that make population-scale studies viable.

In practical terms, LRLP allows researchers to multiplex 48 to 96 samples per PacBio Revio sequencing cell, depending on genome size and target coverage. A study that previously required dozens of deep-sequencing runs can now be completed in a small number of multiplexed runs, with variant calling quality that exceeds short-read low-pass sequencing across every variant class.

This is not a small efficiency gain. It is a big shift in what kinds of studies are economically possible.

How LRLP Works in Practice

The LRLP workflow has four stages. Each one affects data quality, so each one matters.

1. High Molecular Weight (HMW) DNA Extraction. Long-read sequencing requires intact, high-quality DNA. Fragmented or degraded samples will produce shorter reads and lower-quality data. For LRLP at scale, plate-based extraction protocols handle 96 samples in parallel using robotic automation, maintaining consistent DNA quality across the entire batch.

2. Library Preparation and Multiplexing. Each DNA sample is converted into a sequencing library and tagged with a unique molecular barcode. Tagged libraries from multiple samples are then pooled into a single multiplexed sample. The barcode is what allows reads from each individual sample to be separated after sequencing, even though they were all sequenced together. For PacBio HiFi LRLP, multiplexing of 48 to 96 samples per cell is standard.

3. Sequencing on PacBio Revio. The multiplexed library is sequenced on a single PacBio Revio cell. The Revio platform generates HiFi reads with both length and accuracy at scale. Each sample receives a fraction of the total reads, producing low-pass coverage per individual sample while sequencing the entire pool.

4. Bioinformatic Analysis. Reads are demultiplexed back to their original samples using the barcodes, aligned to a reference genome, and analyzed for variants. SNP calling, indel detection, structural variant identification, and haplotype phasing all happen in this stage. The choice of analysis pipeline matters significantly to the final data quality, particularly for SV detection and phasing.

This is the full workflow. Each stage is established, documented, and scalable to large studies.

Coverage Requirements: Why Low Pass Works with Long Reads

The most common question about LRLP is how variant calling stays reliable at low coverage. The answer comes down to information density per read.

A 150-base-pair short read carries information about 150 bases. A 20,000-base-pair HiFi read carries information about 20,000 bases. At the same nominal coverage, long reads cover more of the genome usefully because each read anchors confidently in unique sequence and extends through neighboring regions.

For LRLP, the typical target is under 3X coverage per haplotype. At that depth, long reads reliably call SNPs, indels, and structural variants across most of the genome. Imputation methods further improve genotype accuracy by leveraging linkage information across the population being sequenced.

For comparison, short-read sequencing requires roughly 30X coverage to achieve comparable variant calling reliability in most genomic regions, and short reads still cannot call structural variants confidently even at deep coverage. The economics are fundamentally different.

What LRLP Detects

LRLP captures the full spectrum of genomic variation in a single experiment:

Single nucleotide polymorphisms (SNPs). Standard point mutations, called accurately even at low coverage because long reads anchor reliably in surrounding sequence.

Small insertions and deletions (indels). Insertions and deletions between 2 and 50 base pairs, detected with base-pair resolution.

Structural variants (SVs). Large insertions, deletions, inversions, translocations, and copy number variants greater than 50 base pairs. SVs are largely invisible to short-read sequencing and SNP arrays, and they are increasingly recognized as drivers of complex traits and disease.

Haplotype phasing. Long reads physically link variants on the same chromosome, allowing phasing across multi-kilobase to megabase blocks without requiring parental data or statistical inference.

Methylation. PacBio HiFi reads carry direct methylation information without requiring bisulfite conversion, enabling epigenetic analysis from the same data.

This is the core advantage. One LRLP experiment produces a complete variant profile across all of these classes. Researchers no longer need to run separate experiments to detect different kinds of variation.

How LRLP Compares to Other Methods

LRLP sits between several established methods. Understanding the comparison clarifies when each method is the right choice.

Method Variant Detection Genome Coverage Cost per Sample Best For
SNP Arrays SNPs only, at fixed positions Marker set only Low High-volume genotyping of known variants in well-characterized species
Genotyping-by-Sequencing (GBS) SNPs in accessible regions Reduced representation Low Diversity studies in species without good array options
Short-Read Low-Pass WGS SNPs, some indels Whole genome, limited in complex regions Moderate Population genotyping where SVs are not a priority
LRLP (Long-Read Low-Pass) SNPs, indels, SVs, methylation, phased haplotypes Whole genome including complex regions Moderate Population-scale studies needing complete variant detection
Deep Long-Read WGS Everything LRLP detects, at higher resolution Whole genome High Reference genome assembly, individual-level deep characterization

Compared to SNP arrays: Arrays detect only the variants the array was designed for. LRLP detects every variant in the sample, including rare alleles, novel variants, and structural variants the array cannot see. (For more on the array-to-sequencing transition, see How to Transition from SNP Arrays to Sequencing in Your Breeding Program.)

Compared to short-read low-pass sequencing: Both use low coverage and multiplexing to lower per-sample cost. The difference is data quality.In a 130-line peanut diversity panel, LRLP detected 27,942 variants compared to 2,483 with short-read low-pass sequencing on the same samples. The gap reflects the fundamental limitation of short reads in complex regions, not a coverage deficit. (Lee et al. 2025, bioRxiv preprint)

Compared to deep long-read sequencing: Deep coverage delivers the highest resolution per sample but at a cost that makes large cohorts impractical. LRLP captures the variants that matter for most research applications at a fraction of the per-sample cost.

Compared to GBS: GBS samples a fraction of the genome and is limited to common variants in accessible regions. LRLP is whole-genome and captures variation across the entire sequence, including in regions GBS cannot access.

Where LRLP Is Used

LRLP applies across several major areas of genomics research:

Plant and animal breeding. Population-scale genotyping for genomic selection, GWAS, marker-assisted selection, and variety development. Particularly valuable in polyploid crops where short-read methods struggle.

Human health research. Cohort studies, rare disease research, structural variant detection in neurodegenerative disease, and population genomics in underrepresented groups where reference bias is a known problem.

Biodiversity and conservation genomics. Population-level studies in non-model organisms, often without a high-quality reference genome, where long reads enable de novo discovery of variation.

Pathogen surveillance and microbiome research. Detection of structural variation and complex genomic rearrangements that drive resistance, virulence, and ecological function.

When to Use LRLP

LRLP is the right method when several of the following are true:

  • You are sequencing populations rather than individual samples.

  • Your species is polyploid, has a complex genome, or lacks a high-quality reference.

  • Structural variants are likely to be relevant to your research question.

  • You have hit a wall with arrays, GBS, or short-read low-pass sequencing and need broader variant detection.

  • You want phased haplotype information without requiring trio data.

  • You are working in difficult genomic regions like sex chromosomes, repetitive elements, or telomeric regions.

It is generally not the right method if you only need to genotype a small number of common variants in well-characterized genomic regions and a SNP array already exists for your species. Arrays are still faster and cheaper for that specific use case.

For most other applications at population scale, LRLP outperforms the alternatives.

Frequently Asked Questions

Is LRLP the same as low-coverage long-read sequencing? Yes. The terms are used interchangeably. "Low-pass" and "low-coverage" both refer to sequencing each sample at a fraction of the depth used for deep WGS, typically under 5X per haplotype.

Does LRLP require a reference genome? A reference genome significantly improves variant calling accuracy, but LRLP can be used in species without a high-quality reference. For non-model organisms, long reads also support de novo assembly approaches that can establish a reference from the same data.

Can LRLP detect rare variants? Yes. Because LRLP sequences the entire genome rather than predefined marker positions, rare variants and population-specific alleles are detected as long as they fall within sequenced reads. Imputation methods further improve rare variant recovery at the population level.

What sample types work with LRLP? Any sample that yields high molecular weight DNA. This includes fresh and frozen tissue, leaf material from plants, blood, and properly preserved environmental samples. Highly degraded samples are not suitable for long-read sequencing.

How long does LRLP take from sample submission to results? Turnaround time depends on study size, multiplex level, and analysis requirements. 

Is LRLP more expensive than short-read sequencing? Per sequencing run, long-read costs more than short-read. Per useful variant detected — particularly when SVs and complex regions are included — LRLP is typically more cost-effective at population scale. The cost comparison depends on what is being measured.

Can LRLP data be combined with existing short-read data? Yes, with appropriate statistical methods. Integrating short-read and long-read datasets is an active area of methodology development and is feasible for cohort extension and longitudinal studies.

The Bottom Line

Long-read low-pass sequencing is not a feature of long-read sequencing. It is a distinct method that changes what is economically feasible at the population scale.

The combination of long-read accuracy and low-coverage cost efficiency means researchers no longer have to choose between variant detection quality and study scale. They can have both.

For breeding programs, cohort studies, biodiversity research, and any application where the question requires sequencing many samples, LRLP is the method that makes the data both rich enough and affordable enough to answer it.

Have a project where LRLP might fit? Talk to a scientist or request a quote.

Next
Next

How to Transition from SNP Arrays to Sequencing in Your Breeding Program