How to Transition from SNP Arrays to Sequencing in Your Breeding Program

plant breeding

May 20

If your breeding program runs on SNP arrays, you are not alone. Arrays have been the workhorse of plant genomics for two decades. They are reliable, fast, and most of your team already knows how to use them.

But you have probably started to notice the ceiling.

Markers are fixed to the array design. Rare variants in your population are invisible. Structural variants (SVs), which are increasingly linked to yield, stress tolerance, and complex traits, are not on the panel and when you try to apply your array pipeline to a polyploid crop, the results get messier.

This is not a failure of your program. It is a limitation of the technology.

Sequencing does not have those limitations. The gap between what arrays cost and what sequencing costs has narrowed to the point where the comparison is worth having seriously.

This post covers what the transition actually looks like, what you gain, and how to get started without dismantling your existing workflow.

Why Array-Based Programs Are Hitting a Wall

SNP arrays were designed for a specific job: fast, cheap genotyping across a fixed set of known markers. They do that job well.

The problem is that the fixed marker set is both their strength and their constraint.

Arrays only detect variation at positions defined when the array was designed. If a new variant emerges in your breeding population, or if the variant driving the trait you care about was never on the panel, you will not find it. Instead of actually sequencing the genome, you are checking a checklist of predetermined sites.

This creates three specific problems plant breeders run into:

Rare variant blindness. Arrays are built from variants common in reference populations. Rare alleles, population-specific variants, and novel mutations in your lines are systematically missed.

No structural variants. SNP arrays cannot detect insertions, deletions, inversions, or copy number changes larger than a few base pairs. Structural variants (SVs) are increasingly understood to drive major agronomic traits — flowering time, disease resistance, grain architecture. Array-based GWAS will not find them.

Polyploid complexity. In allotetraploid and hexaploid species, short probes on an array often cannot confidently distinguish between homeologous chromosomes. The result is misclassified genotypes, inflated heterozygosity calls, and downstream noise in your selection models.

What Long-Read Sequencing Actually Gives You

Moving to sequencing — specifically whole-genome sequencing (WGS) — means you are reading the actual DNA rather than probing a fixed set of positions. Every variant that exists in your sample has the opportunity to be detected.

And still not all WGS sequencing is the same.

In a 130-line peanut (Arachis hypogaea) diversity panel, long-read low-pass (LRLP) sequencing detected 27,942 total variants versus 2,483 with short-read low-pass sequencing on the same samples. That is more than 11 times the variant yield. Across agronomically important traits like days to flowering and fruit set timing, LRLP identified substantially more associated marker regions — including associations that short-read sequencing missed entirely. (Lee et al. 2025, bioRxiv preprint)

That gap is not a sequencing depth issue but a fundamental difference in what the technology can resolve.

Long-read sequencing spans repetitive regions, resolves homeologous chromosomes in polyploids, and detects SVs with base-pair breakpoint precision. These are regions where short reads, and array probes, cannot reliably align.

Low-Pass Sequencing: The Bridge Between Arrays and Deep WGS

One of the biggest misconceptions about the array-to-sequencing transition is that you need deep sequencing coverage to replace your array data. You do not.

Low-pass whole-genome sequencing, typically under 5x coverage per haplotype, captures genome-wide variation at a fraction of the cost of deep sequencing. Paired with statistical imputation, low-pass WGS reliably recovers SNPs, indels, and structural variants across the genome.

Long-read low-pass (LRLP) takes this further. Because long reads span complex and repetitive regions, the data quality per read is higher than short-read low-pass, even at the same nominal coverage. You are not just getting more reads — you are getting reads that cover more of the genome usefully.

This is what makes LRLP the practical entry point for breeding programs transitioning off arrays. You do not need a separate deep-sequencing protocol. One cell of PacBio HiFi data, multiplexed across many samples, delivers population-scale genotyping that outperforms arrays on every variant class.

Addressing the Real Transition Concerns

"What happens to my existing array data?"

It does not disappear. Array genotypes and sequence-derived genotypes can be used together in mixed-model frameworks. Your historical data has value. The transition is additive — you are expanding what you can detect going forward, not invalidating what you have already done.

If you have large cohorts of array-genotyped lines that you want to extend with sequencing, that is a solvable problem. The methods for integrating the two data types are established and actively used.

"Is sequencing operationally feasible for a large diversity panel?"

Yes. Current PacBio HiFi protocols support up to 96-plex pooling per sequencing cell, depending on genome size. A 300-line diversity panel can be sequenced across a small number of cells and returned with per-sample variant calls.

Library preparation for LRLP requires high molecular weight DNA, extraction quality matters more than it does for short reads. Veil Genomics uses a proprietary plate-based HMW extraction protocol developed specifically for high-throughput LRLP, supporting up to 96-plex sample processing with robotic automation. Labs that do not want to run extraction and library prep internally can send samples directly to Veil for end-to-end sample preparation.

"What does it cost compared to arrays?"

Cost per sample depends on genome size, multiplex level, and coverage target. For a typical diploid crop genome at low-pass coverage, cost is competitive with mid-range array pricing when factoring in the expanded variant yield. For polyploid species where array data quality is lower, the cost-per-useful-datapoint comparison shifts further toward sequencing. Exact pricing is available on request.

"What about bioinformatics?"

Long-read alignment and variant calling pipelines are well established. PacBio-compatible tools (pbmm2, DeepVariant, PBSV, among others) are documented and widely used. If your lab already runs short-read variant calling pipelines, the workflow is analogous.

For labs that do not have in-house bioinformatics capacity, informatics support for LRLP data is available.

How to Get Started

The most effective way to evaluate the transition is a pilot run on a subset of your panel.

Take a subset of lines from an existing diversity panel — ideally ones you have already phenotyped and have array genotypes for. Run them through LRLP sequencing. Compare the variant yield, the GWAS signal, and the SV calls against what your array pipeline returned on the same lines.

This gives you concrete data from your own population, in your own species, before you commit to a full program change. It also gives you the methodological bridge between your historical array data and your sequencing data going forward.

You do not need to rebuild your breeding program to start. You need one pilot.

The Bottom Line

SNP arrays have served plant breeding well. But they were designed for a world where sequencing was too expensive to run at population scale. That world is changing.

Long-read low-pass sequencing gives you genome-wide variant data, including structural variants, rare alleles, and complex polyploid regions, at a depth and cost that is practical for large breeding programs.

If your program is running into the limits of what arrays can show you, sequencing is not a distant upgrade. It is available now.

Ready to see what your population looks like at genome-wide resolution? Request a quote or contact one of our scientists.

SNP arraysPlant BreedingLong-Read SequencingWhole-Genome SequencingStructural VariantsPolyploid GenomicsLRLP

Veil Genomics