The Real Cost of Sequencing: What Per-Sample Price Doesn't Tell You

Jul 3

Every conversation about sequencing eventually arrives at the same question: how much per sample?

It is the wrong question.

Per-sample price is a procurement metric. It tells you what you pay to run a lane. It tells you nothing about what you learn from it. And in genomics, what you learn is the only thing that matters.

The researchers who have figured this out don't ask what a sample costs. They ask what an answer costs. That shift in framing changes every sequencing decision you will make.

Why Cost Per Sample Became the Default Metric

Short-read sequencing made per-sample pricing the standard because short-read platforms are optimized around throughput. The business model is volume: run more samples, lower the marginal cost per sample. Illumina built an industry around that logic, and researchers built their budgets around it.

It's worth distinguishing between per-sample cost and scientific value. A run that's inexpensive on paper but incomplete in coverage or variant detection can end up costing more in downstream re-work, or simply leaving questions unanswered.

When you price sequencing by the sample, you are pricing the reagents and the machine time. You are not pricing the discovery.

The Unit That Actually Matters: Cost Per Insight

Consider two researchers studying the same crop population. Researcher A runs 500 samples on a short-read platform at $150 per sample. Total cost: $75,000. The data comes back. They find SNPs in the easily mappable regions of the genome. They build a marker panel. They move forward.

Researcher B runs 200 samples using long-read low-pass sequencing. More expensive per sample. Fewer samples. Total cost: comparable. The data comes back. They find SNPs, indels, and structural variants across the full genome — including the repetitive regions short reads cannot reach. They identify a large structural variant linked to disease resistance. They build a more complete picture with fewer samples.

Who spent more per insight?

This is not a hypothetical argument. It reflects a documented reality in genomics: the regions of the genome that can be the most relevant to complex traits are disproportionately found in areas where short reads fail. Missing those regions is not a neutral tradeoff. It is a systematic bias in your data that compounds across every analysis you run downstream.

What Short-Read Pricing Actually Hides

The Mappability Tax

Short reads are 150 base pairs long. When those reads hit repetitive elements, centromeres, telomeres, or GC-rich promoters, they cannot map uniquely. They get discarded. Your sequencing run produces data, but that data has blind spots. You paid for coverage you cannot use.

Long reads are 10,000 to 25,000 base pairs. They span repetitive regions entirely. The same genomic territory that is a blind spot for short reads is fully readable for long reads. You pay for coverage and you get coverage.

The Hidden Cost of Missed Variants

When your method cannot detect structural variants, you do not just miss those variants. You potentially misinterpret your phenotype data. A GWAS that is powered only by SNPs will show unexplained heritability in the regions where structural variants are concentrated. Researchers have a name for this: missing heritability. Missing heritability has many proposed explanations. Structural variants, systematically underdetected by SNP arrays and short-read methods, are increasingly recognized as one of them.

Every grant renewal spent chasing missing heritability with a method that cannot detect structural variants is a direct cost of per-sample pricing logic. It just does not show up on the sequencing invoice.

The Imputation Gamble

Low-coverage short-read sequencing is often paired with imputation: running a small number of samples at higher coverage and using statistical methods to infer genotypes across the rest. This works reasonably well in human populations with large reference panels. It is less reliable in non-model organisms, polyploid species, or populations with limited reference data.

Imputation from long-read data is more accurate because long reads provide better physical scaffolding. They capture haplotype blocks intact. The imputed genotypes downstream are built on a sturdier foundation. The difference in accuracy may not appear in your per-sample invoice, but it will appear in the reliability of your findings.

How to Reframe Your Sequencing Budget

Before selecting a sequencing method, ask three questions:

What fraction of the genome is this method capable of covering in my species?
What variant classes does this method detect, and which does it systematically miss?
If I run this method and miss the variants it cannot detect, what does that cost me in time, in repeat experiments, and in the conclusions I cannot draw?

These questions reframe cost from a procurement decision into a scientific decision. Running fewer samples with better coverage can produce more science per dollar than running more samples with incomplete data.

The Right Question for Your Next Sequencing Decision

What does an answer cost?

Not a sample, an answer. A phenotype explained, a locus identified, a marker validated for breeding selection.

When you price sequencing that way, the calculus changes. Methods that produce richer data per sample become more competitive. Methods that are cheap per sample but produce incomplete data become more expensive per discovery.

This is not an argument against short-read sequencing. Short reads have genuine advantages in specific applications: high-coverage variant confirmation, well-characterized model organisms with clean reference genomes, and studies where the biology is fully contained in the mappable fraction of the genome.

It is an argument for choosing your sequencing method based on what your research question actually requires, and pricing the decision accordingly.

What This Means for Grant Budgeting

Reviewers are increasingly familiar with this argument. A grant proposal that justifies a sequencing approach based on per-sample price alone is a weaker proposal than one that justifies the choice based on data completeness and variant detection capability relative to the stated research question.

If your aim is to characterize structural variation in a polyploid crop species, proposing short-read sequencing because it is cheaper per sample is a methodological mismatch. A well-framed budget section explains: here is what we need to detect, here is the method that detects it reliably at scale, and here is why the cost per insight is justified by the scientific question. That framing is defensible.

The Bottom Line

Per-sample price is a procurement metric. It does not measure scientific value.
Cost per insight accounts for data completeness, variant detection, and downstream analytical reliability.
Short-read sequencing has real blind spots in repetitive regions, structural variants, and complex genomic architecture.
Long-read low-pass sequencing detects the full variant spectrum at population scale, which changes the cost-per-discovery equation.
Fewer samples with better data can produce more science than more samples with incomplete data.
Grant reviewers respond to methodological justification, not to per-sample price comparisons.

Working through your sequencing budget for an upcoming grant or study? We're happy to talk through your options. Talk to a scientist or request a quote.

Kendall Lee, PhD