You Don't Have a Data Problem. You Have a Resolution Problem.

Apr 3

Over the past decade, plant breeding has undergone a data revolution. Sequencing costs have dropped, datasets have grown, and genomic tools are now embedded in modern breeding programs.

And yet—many of the most important traits remain frustratingly difficult to resolve.

Why?

Because most breeding programs don’t actually have a data problem. They have a resolution problem.

What We’re Missing

Short-read sequencing and SNP-based approaches have powered enormous progress. They are fast, scalable, and cost-effective.

But they come with a tradeoff: they only capture a fraction of the genomic variation that drives real biological differences.

In complex plant genomes—especially polyploids—key signals often live in places that short reads struggle to resolve:

Structural variants (insertions, deletions, rearrangements)
Haplotype structure across long genomic regions
Highly repetitive or duplicated sequences

These aren’t edge cases. They are often the mechanisms underlying important traits like yield, stress tolerance, and fruit quality.

When we rely only on partial genomic information, we don’t just lose detail—we risk missing the biology entirely.

Why Resolution Matters in Breeding

In practical terms, limited resolution shows up everywhere:

GWAS signals that don’t quite make sense
QTLs that fail to replicate across populations
Markers that work in one background but not another

These challenges aren’t always due to noise or experimental design. Often, they reflect an incomplete view of the genome.

If the causal variant is a structural variant—or embedded within a complex haplotype—SNPs alone may never fully capture it.

The result?

More time spent chasing signals.
More seasons lost validating markers.
And slower progress toward improved varieties.

The Shift Toward Long-Read Insight

Long-read sequencing has fundamentally changed what’s possible.

For the first time, we can directly observe:

Structural variation at scale
Phased haplotypes across meaningful genomic distances
Variation in regions that were previously inaccessible

But for many breeding programs, long-read sequencing has remained out of reach—too expensive, too data-heavy, or too complex to integrate into existing pipelines.

Bringing Resolution to Real-World Breeding

At Veil Genomics, we’re focused on closing that gap.

By leveraging long-read low-pass sequencing (LRLP), we make it possible to capture high-resolution genomic information at a scale and cost that works for breeding programs—not just reference genomes.

This means:

Detecting variants that short reads miss
Improving the accuracy of downstream analyses like GWAS and QTL mapping
Providing data that reflects the true complexity of plant genomes

Most importantly, it means turning sequencing data into actionable insight.

The Future Is Better Data

Breeding programs don’t necessarily need more gigabases. They need clearer answers.

As genomic tools continue to evolve, the advantage will go to those who can see the genome more completely—who can move beyond proxies and directly observe the variation that matters.

Because in the end, it’s not about how much data you have. It’s about how clearly you can see.

Curious whether your current sequencing approach is leaving signals on the table? Get in touch — we're happy to talk through your data.

Kendall Lee, PhD