Targeted Sequencing Approach and Its Clinical Applications for the Molecular Diagnosis of Human Diseases
The outbreak of COVID-19 has positively impacted the NGS market recently. Targeted sequencing (TS) has become an important routine technique in both clinical and research settings, with advantages including high confidence and accuracy, a reasonable turnaround time, relatively low cost, and fewer data burdens with the level of bioinformatics or computational demand. Since there are no clear consensus guidelines on the wide range of next-generation sequencing (NGS) platforms and techniques, there is a vital need for researchers and clinicians to develop efficient approaches, especially for the molecular diagnosis of diseases in the emergency of the disease and the global pandemic outbreak of COVID-19. In this review, we aim to summarize different methods of TS, demonstrate parameters for TS assay designs, illustrate different TS panels, discuss their limitations, and present the challenges of TS concerning their clinical application for the molecular diagnosis of human diseases.
Keywords: molecular diagnosis, targeted sequencing, next-generation sequencing, COVID-19 detection, bacteria identification, cancer marker detection
1. Introduction
Next-generation sequencing (NGS), a new era of technology, is increasingly used in clinical research, cancer biology, and pharmaceutical development with its exquisite resolution, accuracy, and cost-effectiveness. Upon the development of scalability, high throughput, and user-friendly NGS devices, large-scale NGS experiments are now more affordable than before in a reasonable turnaround time [1,2]. This has led to the expanding implementation of NGS from research to the clinical laboratory [1].
There are three types of NGS sequencing, namely whole genome sequencing (WGS), whole exome sequencing (WES), and targeted sequencing (TS). WGS provides the most comprehensive coverage, which is more suitable for novel gene discovery and research applications [3]. WES involves sequencing exomes, which are composed of exons only, and some of the exons are with the coding region for protein translation [4]. Compared to WGS and WES, TS panels focus on a particular cluster of genomic regions and have fewer data burdens with the level of bioinformatics or computational demand [1]. It can simplify data interpretation with excellent coverage depth, facilitating lower cost and faster turnaround times, essential to many industrial and clinical applications where speed and cost are the most important. Prior to the development and the use of the TS panels, it is important to include the additional step of target enrichment for the genomic regions that are of interest and compared to the genomic background. This step is crucial and ensures that the NGS process is specifically designed to sequence the genomic targets efficiently and accurately. Precisely, the process focuses on the amplification of the target gene or sequences of interest, thus allowing high sensitivity and specificity in identifying sequence variations in diseases [5]. The common sequence enrichment processes include the polymerase chain reaction-based amplicon and hybrid capture-based technique, which will be elaborated on further below [6]. Additionally, nowadays, TS has become an important routine technique in both clinical and research settings, with the advantages of high confidence and accuracy and relatively low cost. Since the comparability of different approaches and techniques for mutation profiling still exists, there are many commercial solutions available for researchers or clinicians to choose from in their assay designs.
The sequencing of genomic DNA extracted from normal tissues (germline) and tumours (somatic) are the two most common approaches in research or clinical application for the appropriate treatment decision [7] or making correct prognosis monitoring of cancer patients by comparing mutations through tumour molecular profiling. In addition, throughout the Coronavirus Disease 2019 (COVID-19) pandemic, different target enrichment NGS panels were also developed as the “molecular fingerprint” for viral detection, identification, and characterization of the patient’s sample, with a positive result of COVID-19, surveillance testing, and environmental monitoring [8]. Scientists understand the transmission of disease, tracking strain origin and viral evolution through the full sequence information. However, more efforts are needed to be put into the assay design, including the purpose and scope of the assay, pre-analytic consideration, sequencing, bioinformatics, and interpretation and reporting, for the development of cost-effective approaches for the molecular diagnosis of diseases. Most of the existing pipelines and approaches designed for WGS or WES can be applied in the data analysis of TS. However, due to the requirement for the high depth of coverage in TS, it is critical to make sure only the variant cells with high quality are retained during the data analysis of TS, especially for the data generated from fragmented and poor-quality DNA [1].
In this review, the applications of TS in various clinical and research assays and, also, the important parameters arising from recent studies will be discussed. Moreover, the advantages and limitations of the recent TS panels used to profile clinical samples will be presented in this review as well. This review aims to provide an overview and updates on the use of TS in the field of microbiology and for human diagnostic purposes.
2. Targeted Sequencing
Many well-known gene mutations that cause disease pathogenesis such as cancer driver genes have been widely applied in clinical operations. TS panels focus on a selected number of these specific genes for diagnosis, prognosis, treatment monitoring, etc. Therefore, the cost can be reduced, and greater confidence and better insurance reimbursement opportunities will be provided by using TS panels in clinical settings [1].
For profiling different clinical samples with lower tumour contents and DNA quality, such as circulating tumour DNA (ctDNA) and formalin-fixed paraffin-embedded (FFPE), TS provides a greater sequencing depth of coverage (1000× or higher) than the non-NGS-based techniques, such as allele-specific amplification refractory mutation system (ARMS), polymerase chain reaction (PCR), allele-specific PCR (AS-PCR), bead emulsification amplification and magnetics (BEAMing) technology, droplet digital PCR (ddPCR), and Sanger sequencing. This approach is capable of picking out mutations that are only present in a small part of malignant cells and able to detect a variant allele frequency (VAF) as low as 0.1–0.2% in the case of detecting minimal residual disease [1]. In addition, since the mutations that cause truncation or possible mRNA attenuation in any region of the tumour suppressor genes can be considered clinically significant, the technologies mentioned above are impossible to detect the whole regions of tumour-related genes [1].
2.1. The History of Sequencing and Discovery of TS
The first DNA sequencing, called Sanger sequencing or original DNA sequencing, was developed by Frederick Sanger et al. in the 1970s [9]. In the following years, Sanger sequencing was continuously improved, such as the replacement of phospho- or tritium-radiolabelling with fluorometric-based detection and improved detection through capillary-based electrophoresis [10]. These improvements made sequencing more efficient and accurate. Next, the pyrosequencing technique was pioneered by Pål Nyrén and colleagues and later licensed to a biotechnology company named 454 Life Sciences. Pyrosequencing can be performed using natural nucleotides (instead of the heavily modified dNTPs used in the chain termination protocols) and observed in real time (instead of requiring lengthy electrophoreses) [11]. These techniques form the backbone and stimulate the development of NGS applications. NGS brings about a revolutionary understanding in basic and clinical research due to the massively parallel analyses, ultra-high-throughput, cost-effectiveness, and accuracy. Although the principles behind NGS and sanger sequencing are similar, NGS can bind millions of DNA pieces by using flowcell and sequencing at the same time, but Sanger sequencing can only sequence one fragment at a time. Currently, there are three major systems of NGS, including (i) the Roche 454 System, the detection of pyrophosphate released during nucleotide incorporation; (ii) AB sequencing by Oligo Ligation Detection (SOLiD); and (iii) the Illumina GA/HiSequ System that is based on Solexa’s Genome Analyzer (GA)—sequencing by synthesis (SBS) [12].
James D. Watson’s genome was the first individual genome sequenced using the Roche/454 NGS platform and was completed in two months by Wheeler et al. and colleagues [13]. WGS investigates the whole genome, including coding, non-coding, and mitochondrial DNA. Another objective of WGS is to discover novel and unknown genomic variants for the target diseases. The first disease-relevant variants were reported in a family with a recessive form of Charcot–Marie–Tooth disease by WGS [14]. Moreover, WGS was widely applied in cancer genome sequencing and provided diagnostic and therapeutic information for cancer patients. WES was developed to capture protein-coding regions of the genome. Compared to WGS and WES, TS focuses on specific genes and coding regions of interest in the genome with greater sequencing depth. The target genes or regions are well known to relate to the pathogenesis of diseases and clinical relevance. For instance, TS panels have been developed for detecting and monitoring cancer-inherited gene mutations and somatic changes and are important for explaining the landscape of genetic mutations that occurs across different cancers. Information on mutations was important to identify novel therapeutic repurposing and make therapeutic decisions. An example is the identification of microsatellite instability in colorectal carcinoma, which can affect the treatment strategy [15]. Additionally, Frampton et al. found clinically feasible mutations in 76% of the 2221 tumours studied; compared with other modern diagnostic tests, including Sanger sequencing, mass spectrometry genotyping, fluorescence in situ hybridization, and immunohistochemistry, the operable detection of drugs has been increased three times [16,17].