Comparative Genomic Analysis of SARS-CoV-2

Overview

This project focused on a comparative genomic and proteomic analysis of SARS-CoV-2, aiming to identify conserved genomic regions and mutations in the Spike protein that influence viral infectivity.
The analysis combined sequence retrieval, alignment, and functional interpretation, providing insight into how specific amino acid substitutions may affect viral structure and receptor binding.

Objectives

Retrieve and compare SARS-CoV-2 genome and protein sequences from multiple global isolates.
Perform local and global sequence alignments to identify conserved and variable genomic regions.
Analyze Spike (S) protein mutations to evaluate their potential impact on viral infectivity and structural stability.

Methodology

Data Collection:
- Genomic FASTA sequences and protein annotations were obtained from NCBI and UniProt databases.
- Multiple SARS-CoV-2 isolates from different countries were selected to assess genomic diversity.
Sequence Alignment:
- Conducted pairwise and multiple sequence alignments using tools such as BLAST, EMBOSS Needle, and Clustal Omega.
- Conserved motifs and mutation hotspots were identified, with emphasis on the Spike protein’s receptor-binding domain (RBD).
Functional Analysis:
- Compared amino acid substitutions with known literature on structural and functional changes.
- Evaluated the potential effects of these mutations on viral binding affinity to the ACE2 receptor.

Tools and Technologies

Databases: NCBI, UniProt
Software: BLAST, EMBOSS, Clustal Omega
Languages: Python (for sequence preprocessing and visualization)

Reflections

This project provided foundational experience in computational genomics and protein analysis, strengthening my understanding of how sequence-level variations can influence biological function.
It also reinforced the role of bioinformatics in real-world biomedical challenges, particularly in pandemic-scale genomic research.