SARS-CoV-2: Biology Origins, and How Open Science is Accelerating the Search for Therapeutic Answers
July 13, 2020

DNAstack's bioinformatician Heather Ward breaks down the biology of the novel coronavirus responsible for the COVID-19 outbreak.


Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is the novel coronavirus responsible for the COVID-19 outbreak that first emerged in early December 2019 in Wuhan, China. As of March 20, 2020 SARS-CoV-2 has resulted in nearly 250,000 cases worldwide, claiming the lives of over 10,000 people.

Here, I’ll briefly break down the potential origins and viral life cycle of SARS-CoV-2, how it differs from the virus responsible for the 2002 outbreak, and how genomics and open science can be used to explore and develop therapeutics that will help mitigate this global threat.

SARS-CoV-2 and related coronaviruses

SARS-CoV-2 is a coronavirus, members of a class of positive-sense single-stranded RNA (ssRNA) viruses so named due to their resemblance to solar coronas. Other ssRNA viruses cause diseases which range in severity, including HIV, West Nile, and the common cold.

There are several coronaviruses known to infect humans, with the most well-known being SARS-CoV (responsible for the 2002 outbreak) and MERS-CoV (Middle Eastern Respiratory Syndrome Coronavirus). Both of these coronaviruses, as well as the current SARS-CoV-2, are believed to have originated in bats, which act as a natural reservoir for a number of coronaviruses. The virus is postulated to pass to humans via an intermediary host (civet cats in the case of SARS-CoV, and dromedary camels for MERS-CoV). Several potential hosts have been suggested as the intermediary for the current SARS-CoV-2, including snakes and pangolins.

It’s important to note that the majority of these bat-endemic coronaviruses are not able to infect humans, and mutation is required for a coronavirus to be able to transition to a new host organism. To obtain insight into which parts of the genome require mutation to allow a virus the ability to target a new host first requires an understanding of the basics of the coronavirus viral life cycle.

[caption id="attachment_3926" align="aligncenter" width="701"]

Figure 1: SARS-CoV-2 virion. [/caption]

The SARS-CoV-2 viral life cycle

The major steps of the viral life cycle of SARS-CoV-2 as well as other coronaviruses include:

  1. Binding of the virus to a receptor on a target host cell
  2. Membrane fusion between the viral envelope and the host cell, which releases the viral genome into the host cell
  3. Replication of the viral genome
  4. Transcription and translation of viral structural proteins
  5. Assembly and export of mature virions

Mature virions (packaged viral particles including the viral genome and structural proteins, see the SARS-CoV-2 virion pictured in figure 1) released from an infected host cell may infect other cells and continue the infection cycle.

If virions are unable to bind to host cell receptors or if membrane fusion does not occur, infection will not take place. These key steps are both mediated by a particular viral protein — the spike protein.

The Spike protein

The spike protein is a homotrimeric (made up of three identical peptides) transmembrane protein found studded around the exterior of the mature virion. Each monomer (one of the three identical peptides) is comprised of two subunits: the S1 subunit, which is responsible for recognizing and binding to a host cell receptor, and the S2 subunit, which facilitates membrane fusion and release of the viral genome into the host cell (see figure 2).

Because the virus can only infect host cells that it is able to bind to, the S1 subunit of the spike protein is responsible for host specificity — the range of hosts that the virus is able to infect. In order for a virus to be able to infect a new organism — e.g. in the transition between bat and human hosts — the receptor binding domain of the S1 subunit must gain the ability to bind to a receptor found in that new host. In both SARS-CoV and SARS-CoV-2, the human receptor appears to be the protein angiotensin converting enzyme 2 (ACE2), which is found on the surface of cells in the human respiratory tract. Interestingly, despite targeting the same receptor protein, many of the key amino acids that interact with the ACE2 receptor and that were previously thought to be essential for binding to ACE2 appear to be almost completely distinct between the SARS-CoV and SARS-CoV-2 receptor binding domains, implying that specificity for the same receptor may have evolved independently in each strain.

[caption id="attachment_3928" align="aligncenter" width="587"]

Figure 2: Structure of the SARS-CoV spike protein monomer (blue and green) bound to the ACE2 receptor (yellow). The spike protein is comprised of the S1 (blue) and S2 (green) subunits. S1/S2 and S2' cleavage sites are labelled in red. Generated using open-source PyMOL™ from the cryo-EM structure.[/caption]

Activation of the spike protein following receptor binding

Receptor binding alone is not sufficient for viral infection. Binding initiates conformational changes in the spike protein that lead to membrane fusion and infection, but another step is required before fusion can take place: cleavage of the spike protein.

There are at least two cleavage sites on the spike protein that must be cut prior to viral entry; one between the S1 and S2 subunits (S1/S2 site) and one internal to the S2 subunit (S2' site) (see figure 2, red). Cleavage at the S1/S2 site primes the protein and leads to cleavage of the S2' site, which is necessary for membrane fusion. The specific proteases (proteins that cut other proteins) that are able to perform the cleavage steps depend on the amino acid sequence that is present at each cleavage site; in many cases, several different proteases are able to cut the same site with greater or lesser efficiency.

Similar to the host-specificity of the receptor binding domain, if cleavage sites are not recognized by host proteases, cleavage and therefore infection will not be able to occur in that host. This means that both a receptor binding domain that recognizes a host target as well as cleavage sites that can be cut by host proteases are required for transmission of the virus to a novel host. For example, some bat coronaviruses have been found that are able to bind to human proteins but fail to initiate infection because their spike protein is not cleaved in human hosts.

A novel cleavage site on SARS-CoV-2

In SARS-CoV-2, a novel cleavage site has been discovered at the S1/S2 junction which is cleaved by a ubiquitous human protease known as furin. The inclusion of this novel furin site allows the SARS-CoV-2 spike protein to be cleaved during biosynthesis — this means that the protein is ‘primed’ even prior to release of the virion from the host cell. This is in contrast to the spike protein produced by SARS-CoV, which lacks this site and is released from the cell intact, requiring later cleavage before it can facilitate membrane fusion.

It is unclear whether priming during biosynthesis has an impact on viral infectivity; a 2006 study by Follis et al. found that the introduction of a furin cleavage site into SARS-CoV’s spike protein at the S1/S2 junction resulted in enhanced membrane fusion between virus and host, but could find no evidence for an accompanying increase in infectivity. It remains to be seen how the novel furin site in SARS-CoV-2 will impact its infectivity and spread.

A key target for therapeutic agents

Researchers across the globe are searching the SARS-CoV-2 genome for features that will allow it to be targeted by therapeutic agents. Due to the nature of the spike protein and its fundamental role in mediating host specificity and viral infection, it represents an attractive target for the development of therapeutic agents. In particular, mechanisms targeting receptor binding, proteolytic cleavage, and membrane fusion may prove effective in attenuating the virus’s ability to infect human cells. Due to the genetic similarity between the novel SARS-CoV-2 and SARS-CoV, including their shared receptor target, it is possible that agents shown to be effective against SARS-CoV may also prove effective at slowing SARS-CoV-2.

SARS-CoV-2 Research

The swift response of researchers worldwide to study SARS-CoV-2 and to share sequencing data publicly has allowed for rapid insights into key genetic features that will prove indispensable in the days and months to come. This tremendous, coordinated global effort to elucidate the origins and mechanisms of the virus could not have been accomplished without the aid of modern technologies allowing researchers to share data quickly across geopolitical borders. This reaffirms the essential role of technology in facilitating science, especially in the ability to respond quickly to global emergencies.

To that end, DNAstack has developed a beacon for SARS-CoV-2 where users can explore aggregated genetic variants discovered by labs worldwide. Explore it here:

About the Author

Heather is part of the Data Science Team at DNAstack, where she authors, tests, and runs analytical pipelines for internal and customer projects

References and Further Reading

  • Belouzard, S., Chu, V.C. and Whittaker, G.R. 2009. Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites. PNAS106(14): 5871–5876.
  • Chan, J.F.W., Kok, K-H., Zhu, Z., Chu, H., To, K. K-W., Yuan, S. and Yuen, K-Y. 2020. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerging Microbes & Infections9: 221–246.
  • Coutard, B., Valle, C., de Lamballerie, X., Canard, B., Seidah, N.G. and Decroly, E. 2020. The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade. Antiviral research176: 104742.
  • Gong, S. and Bao, L-L. 2018. The battle against SARS and MERS coronaviruses: Reservoirs and animal models. Animal Model Exp Med. 1:125–133.
  • Follis, K.E., York, J. and Numberg, J.H. 2006. Furin cleavage of the SARS coronavirus spike glycoprotein enhances cell-cell fusion but does not affect virion entry. Virology. 350:358–369.
  • Millet, J.K. and Whittaker, G.R. 2015. Host cell proteases: critical determinants of coronavirus tropism and pathogenesis. Virus Research. 202: 120–134.
  • Racaniello, V. Furin cleavage site in the SARS-CoV-2 coronavirus glycoprotein. Virology blog. Published February 13, 2020. Accessed March 10, 2020.
  • Song, W., Gui, M., Wang, X. and Xiang, Y. 2018. Cryo-EM structure of the SARS coronavirus spike glycoprotein in complex with its host cell receptor ACE2. PLOS Pathogens
  • Walls, A.C., Park, Y-J., Tortorici, M.J., Wall, A., McGuire, A.T. and Veesler, D. 2020. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell180: 1–12.
  • Wong, M.C., Cregeen, S.J., Ajami, N.J. and Petrosino, J.F. 2020 (preprint). Evidence of recombination in coronaviruses implicating pangolin origins of nCoV-2019. bioRxiv, preprint.
  • Xia, S., Zhu, Y., Liu, M., Lan, Q., Xu, W., Wu, Y., Ying, T., Liu, S., Shi, Z., Jiang, S. and Lu, L. 2020. Fusion mechanism of 2019-nCoV and fusion inhibitors targeting HR1 domain in spike protein. Cellular & Molecular Immunology
  • Xu, X., Chen, P., Wang, J., Feng, J., Zhou, H., Li, X., Zhong, W. and Hao, P. 2020. Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission. Science China Life Sciences63(3): 457–460.

About DNAstack

DNAstack’s mission is to improve the lives of millions of people by breaking down barriers to data sharing and discovery. DNAstack develops standards and technologies for scientists to more efficiently find, access, and analyze the world’s exponentially growing volumes of genomic and biomedical data. For additional support or partnership interest, please contact us by email to

Photo Credits 

Figure 1: CDC/Alissa Eckert, MS; Dan Higgins, MAMSFigure 2: Song et al., 2018; PDB accession 6ACK.