COVID-19 Case in Cambodia


Incubation periods and transmissibility of COVID-19 are questions of current worldwide concern, but difficult to answer given nonspecific symptoms that represent a variety of respiratory viruses. In this setting, genetic characterization of the virus from geographically diverse patient samples is key to infer the rate of spread. However, as the virus reaches resource-limited settings such as Cambodia, Laos, and Myanmar in close proximity to the outbreak’s epicenter, there are basic challenges in sample collection, contact tracing, and surveillance that hinder disease containment, much less the ability to sequence new cases in-country. Implementing in-country sequencing and post-sequencing data analysis speeds up the time to pathogen identification, giving scientists in-country the ability to inform leading public health officials to combat emerging infections.


In a rapidly implemented response to the nCOV-2019 outbreak (manuscript), the NIH-CNM team and the Institut Pasteur du Cambodge used metagenomic next-generation sequencing (mNGS) and the CZ ID bioinformatics platform to review the Cambodian index nCOV-2019 case in less than 48 hours from sample receipt.

This project had 2 goals:

  1. Characterize Cambodia's first case of the novel virus using in-country mNGS approaches.
  2. Determine the full genomic sequence for immediate public use, comparing the sequence to existing published genomes around the world to understand global transmission patterns.

What were the scientists’ hypotheses?

  1. In-country mNGS performed on an iSeq100 could be used to confirm and further characterize the genomic sequence of SARS-CoV-2 in the first RT-PCR confirmed positive case of the novel coronavirus in Cambodia.
  2. Given the patient traveled directly from Wuhan to Cambodia on January 23, his virus should be phylogenetically similar to initial sequences published by Chinese researchers in Wuhan.

How was the data generated?

The Institut Pasteur du Cambodge, National Referral Laboratory for COVID-19, performed the RNA extraction from a nasopharyngeal swab from the index case. On February 1st, the NIH-CNM team made sequencing libraries from the extracted RNA, sequenced the samples on an iSeq100, demultiplexed the FASTQ files, and the results of the sequencing run were compiled into Illumina’s Basespace. The data was then uploaded to CZ ID and processed using the latest CZ ID database - updated from NCBI on 2019-09-17. While the reference database did not contain the reference sequences for SARS-CoV-2, which were deposited to NCBI in January 2020, conclusions could still be drawn from the data. CZ ID’s NCBI reference database was updated on 2019-02-10 with the most recent version of NCBI that included the recently added SARS-CoV-2 sequences. The analysis was rerun using the new NCBI reference database to confirm the match to SARS-CoV-2 in the PCR-positive index case. Further experiments were done using an enrichment protocol to isolate a full genome sequence for SARS-CoV-2 from the Index Patient. The sequence generated during the current study was submitted to the GISAID repository (GISAID EPI ISL 411902), and is available for comparison with other SARS-CoV-2 sequences, globally, on Nextstrain.

What conclusions were drawn from the data?

  1. SARS-CoV-2 was identified in the Index Case sample (Patient 1), confirming the PCR-positive results. Looking at the sample and pipeline run with the most recent reference database you can see that CZ ID picked up 582 reads that aligned to the Wuhan seafood market pneumonia virus (taxID 2697049) with an average percent identity of 100% in NT. CZ ID was able to generate 26 contigs from those reads with 33.2% coverage of NCBI accession sequence MN985325.1.
  2. Re-sequencing of the initial Index Case in an attempt to obtain full genome coverage resulted in a greater number of reads aligning to SARS-CoV-2 (819 reads, 22 contigs), with limited impact on total coverage.
  3. By using a target enrichment strategy, a full-length contig could be assembled from the sample with an average depth of 14.9x. The SARS-CoV-2 genome sequence was uploaded to the GISAID repository (GISAID EPI ISL 411902).
  4. As expected, the COVID-19 genome from the Cambodian Index Case was highly similar to the other published COVID-19 genomes, showing only 1 SNP at position 25,654 in ORF3.