Banner

Interested in the products mentioned in this resource?

Send us a message

Introduction

Next generation sequencing (NGS) is now in routine use for a broad range of research and clinical applications. The rapid rate of adoption has been facilitated by falling reagent costs, benchtop instruments, improved chemistries and improved data analysis solutions. However, the cost and complexity of data analysis still remain significant hurdles — particularly for whole genome sequencing. In the majority of cases, targeted approaches, such as custom NGS panels, are more cost-effective and generate significantly less, but equally meaningful data in a much shorter timescale.

Targeted sequencing requires an initial sequence enrichment step, which, if poorly designed, can be a source of bias and error in the downstream sequencing assay1. This application note discusses the main strategies employed to optimise the enrichment step, depending on the type of assay chosen.

Which enrichment assay?

Two broad categories of enrichment assays exist: amplicon (PCR) and hybridisation (Figure 1). As a very general rule, hybridisation-based assays, when designed well, offer superior performance2.

Hybridisation protocols start with random shearing of the DNA, followed by “capture” of the randomly sheared overlapping fragments with long oligonucleotide (oligo) baits. This allows independent sequencing of a large number of unique fragments. Any duplicates (assay artefacts) can be easily identified and removed, leaving high-quality data for analysis. Because the fragments are randomly sheared they should not align perfectly with one another and if they do, they are most certainly duplicates. In addition, enrichment of challenging regions such as GC-rich regions or internal tandem repeats can be optimised by careful positioning and design of baits. Long oligo baits can tolerate sequence variation, so that all alleles of a heterogeneous mix can be captured equally. Amplicon assays require design of primers flanking the region to be amplified. The resulting amplification products are identical, such that duplicates cannot be distinguished from unique products.

In making the choice between hybridisation and amplicon approaches, there are several factors which are worth considering (Figure 2).

1. Size of the region to be targeted

  • Hybridisation-based assays are amenable to any size of target region, from very small to very large (i.e. whole exome). 
  • Amplicon assays, although ideal for small numbers of well-defined regions, are challenging to multiplex to any great extent. As the degree of multiplexing and/or the number of PCR cycles increases, so does the tendency towards bias and error. Primer competition and non-uniform amplification of target regions caused by varied GC content (Figure 3) or amplicon length for example in the presence of an insertion (under-represented) or deletion (over-represented) (Figure 4), contribute to variation in amplification efficiency. Some platforms attempt to overcome this by performing hundreds to thousands of single-plex reactions which are then combined prior to sequencing.

2. Required turnaround time

  • Hybridisation assay protocols are more time consuming than amplicon approaches. However, hybridisation protocols that once took two to three days, and require large numbers of manual steps are now much more streamlined. Oxford Gene Technology (OGT), as an example, in its library preparation protocol has introduced a short enzymatic fragmentation step, combined the end-repair and adaptor ligation steps, and optimised the hybridisation step itself to just 30 minutes. Meaning that hybridisation assay protocols can now take users from sample to sequencer in a single day. 
  • For speed and simplicity, PCR-based approaches are normally faster, only a few hours in duration, with fewer steps. It is important to note, however, that PCR itself is the most common source of bias and error in any enrichment assay, so this advantage of speed may need to be balanced with the requirement for high-quality data, and the additional time required to validate potential false positive results.

 

3. Ability to optimise for challenging regions

  • Some genes such as FLT3 contain internal tandem duplications that are challenging to target using amplicon approaches because they are by nature repetitive and can be very long and are generally masked in most designs. To provide optimal results, OGT employs sophisticated bait design strategies to create additional probes both up- and down-stream of the repetitive region. The length of these probes is customised to provide the same isothermal properties as other probes in the panel, leading to improved performance. This facilitates the capture of sequences around the repeat masked areas, and because of the read lengths, can read through short (up to 100 bp) repetitive regions.
  • Other challenges arise for amplicon assays when novel variants are present within the primer site (Figure 5), as these can result in strand or allelic bias, or even drop-out of that region altogether. Hybridisation assays, on the other hand, are less restricted by variant position and can still enrich all strands and alleles equally even in the presence of multiple novel variants.
  • Similarly, with regions that are GC-rich, hybridisation baits can still be designed to capture efficiently giving much more uniform coverage than amplicon assays. An example of this is given in Figure 6, where the high (up to 90%) GC content of the CEBPA gene has made it notoriously difficult to sequence and analyse reliably with NGS3. However, OGT’s bait design expertise can overcome these difficulties, providing excellent depth of coverage and uniformity.

4. Robustness with challenging samples

Samples vary in quality and quantity so any assay must be able to deal with a wide range of input DNA types and input quantities:

  • Formalin-fixed, paraffin-embedded (FFPE) samples: Both hybridisation and amplicon assays can be optimised to perform well with FFPE, though PCR methods, in particular, are susceptible to contaminants found in FFPE material. Assay performance can be substantially improved by the usage of an upstream FFPE repair step (Figure 7), which can remove a broad range of damage, such as nicks and gaps, oxidised bases and cytosine to uracil deamination.
  • Starting quantity of DNA: Amplicon assays offer a slight advantage in being able to work with smaller quantities of input DNA, often down to 10ng. Hybridisation assays generally require more input DNA — typically ~500 ng — although well designed hybridisation assays can utilise significantly less input DNA (Figure 8), where using the OGT SureSeq FFPE Repair Mix a reduction in the amount of starting material down to 100ng or lower is possible, depending on the required depth of coverage required.
  • Duplicates: It is important to note that with small starting quantities, the actual number of templates available is relatively low, so duplication rates can increase significantly. With hybridisation assays, these can be removed computationally to leave clean, high-quality data. With amplicon assays, this is not possible without the use of molecular barcodes, and the resulting data may be skewed by the over-amplification of a small number of fragments.

 

5. Requirement for accuracy and sensitivity

Performance should be a key requirement for all applications: the confidence that the assay will detect all variants present in any region of interest, while avoiding false negatives and false positives. Hybridisation assays offer a number of key benefits which enhance performance:

  • Reduced false negatives:The most common reason for false negatives in a targeted sequencing assay is poor coverage at the locus. Hybridisation assays, when well designed, can deliver superior uniformity of enrichment and excellent coverage of all loci, thereby reducing the incidence of false negatives. 
  • Reduced false positives:The most common cause of false positives are artefacts introduced by PCR polymerases, even when using proofreading enzymes. Hybridisation assays use very few PCR cycles, in comparison to amplicon assays, and therefore the data is less “noisy”. 
  • Higher detection sensitivity:The reduced “noise” in the hybridisation assay also delivers higher detection sensitivity of variants present at low frequency in the sample. 
  • Broader scope for discovery of novel as well as known variants:It can be challenging to validate an assay for the discovery of novel variants across large numbers of genes. However, because the ability to detect variants with sensitivity and accuracy largely relies on the depth of coverage at the locus, as well as quality and accuracy of the data, hybridisation-based assays can offer greater confidence in detection of all variants present.

 

6. Price

  • Price is always an important factor. For larger regions, hybridisation-based panels are very cost effective. For smaller regions, amplicon assays can be more cost effective because the cost of a small number of primers is low. Price must always be considered in parallel with performance requirements for the application of interest and with the cost of any additional sequencing required downstream.

Uniformity of enrichment as a key metric of performance

The ultimate goal of any sequencing assay is to discover all variants present. Uniformity of enrichment means that all regions are represented more equally, and that variants present in any region will be called (Figure 9). It also allows much lower average sequencing depths to be used, enabling larger numbers of samples to be multiplexed in a run, and significant cost savings.

Uniformity is particularly important when looking at heterogeneous samples. For example tumour mixed with normal tissue, or somatic variants present only within a single clone in a heterogeneous tumour sample where it is essential to have enough reads to confidently call a variant at any given position.

The importance of optimising the enrichment assay

As NGS moves forwards, the aspiration is clear: confident calling of all variants present with no false negatives and no false positives. The most common reason for missing variants (i.e. false negatives) is lack of coverage at the variant locus due to non-uniform enrichment, the importance of which is illustrated with the example of exon 9 of CALR gene, an important exon for interrogation of myeloproliferative disorders (Figures 10 and 11).

The highlighted box below provides a summary of all of the potential sources of error and bias. As a very broad rule, hybridisation-based assays offer greater opportunity for optimisation through probe design and placement, and they can offer better uniformity of coverage, fewer false positives, and superior variant detection due to fewer PCR cycles. Hybridisation-based assays also offer greater scope in terms of the number of genes and regions that can be targeted.

Conclusions

The choice of enrichment assay for targeted sequencing assays is an important consideration. No enrichment technology is perfect for every application. The choice will depend largely on the size of region to be targeted, cost of sequencing and the required coverage uniformity and sensitivity of the assay. Optimisation is also important and hybridisation-based assays offer more scope for superior performance through optimisation of bait design.

Spotlight: Potential sources of bias and error in sequence enrichment assays

PCR artefacts

Even proof-reading PCR polymerases introduce errors. The likelihood of artefacts (as well as the rate of duplication) increases with increasing PCR cycles. Amplicon-based assays are reliant entirely on PCR, and are potentially more susceptible to artefacts than hybridisation-based assays, which aim to minimise the number of PCR cycles.

Duplicate reads from a single template

Duplicates are amplification artefacts, normally arising during the library preparation stage. It is highly desirable to remove them prior to data analysis, otherwise some regions may be massively over-represented. In a hybridisation-based assay, duplicate reads can be identified easily and removed. However, in a PCR assay, not only is it impossible to remove duplicates during the analysis step, but it is also true that there are generally more duplicates due to the enrichment step. Duplicates are a particular problem when input quantities of DNA are limiting, because higher numbers of PCR cycles need to be used.

Bias in PCR amplification — PCR drift and PCR selection

PCR is difficult to multiplex where consistent and uniform amplification of each region is desired. There are thought to be two main processes that introduce amplification bias during PCR, PCR selection and PCR drift4. In PCR selection, some amplicons are favoured and therefore over-represented due to intrinsic properties of the target sequence, flanking sequences or genome composition. Key contributors to this type of variation include preferential denaturation and amplification of low GC content templates, higher binding efficiency of GC-rich primers (particularly when using degenerate primers) and direct correlation between amplification efficiency and gene copy numbers. PCR drift is assumed to be caused by random interaction of the components of the mix early on in the amplification when the original genomic material is still the main source of template. This type of variation is variable between reactions and more difficult to control. There are ways to optimise PCR that will reduce bias1 including increasing the amount of template, reducing the number of cycles, optimising the instrumentation and performing the multiplex in a number of discrete, lower-plex reactions which are then pooled after amplification.

Variant positional bias in amplicon assays

Variant positional bias is a particularly important consideration for amplicon assays where there is some constraint in choice of position for primer sites. If a SNP or variant happens to fall within the primer site itself, it will not be detected. This is less likely to be an issue with hybridisation-based methods that use long oligonucleotides that can tolerate target sequence variation and can be tiled across the region to be enriched.

Repeat regions and pseudogenes

Repeat regions and pseudogenes represent significant challenges for all enrichment technologies. Depending on the size of the region, hybridisation-based assays may allow better targeting of these regions, by allowing design of baits to flanking regions.

References

  1. Aird, D. et al (2011) Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biology 12:R18 doi:10.1186/gb-2011-12-2-r18 
  2. Samorodnitsky, E. et al (2015) Evaluation of hybridization capture versus amplicon-based methods for whole-exome sequencing. Hum Mutat 36(9), 903-915 
  3. Behdad, A. et al (2015) A clinical grade sequencing-based assay for CEBPA mutation testing: report of a large series of myeloid neoplasms. J Mol Diag 17(2), 76-84 
  4. Wagner, A. et al (1994) Surveys of gene families using polymerase chain reaction: PCR selection and PCR drift. Syst Biol 43, 250–261

 

SureSeq™: For Research Use Only; Not for Diagnostic Procedures. This document and its contents are © Oxford Gene Technology IP Limited – 2021. All rights reserved. Trademarks: OGT™, SureSeq™ and myPanel™ (Oxford Gene Technology); Agilent® (Agilent Technologies Inc.); MiSeq® (Illumina Inc.).

Connect with our NGS experts today

Call +44 (0)1865 856800 Email contact@ogt.com

Send us a message and we will get back to you