Skip to main content

Posts

Showing posts from July, 2020

My COVID-19 jupyter notebook

Over the last few weeks I have been working on the understanding of the COVID-19 genome. As a Computer Scientist, I was able to work out a way to compare genome sequences and this post is the result of my comparison. I needs way much more understanding that is way beyond my knowledge, and I am sharing this unfinished work for anyone interested in the topic that can give a hand on understanding genomic similarities, field which is not my expertise. First of all, let me explain what was the idea: originally coming from Kaggle call of arms, we want to understand how the virus evolved into human contagion, and the best way to start is from analysing the genomic sequences. I started taking the genomes published in GenBank. I decided to make the following strategy: 1. Download the FASTA files found on GenBank of:         - COVID-19 from Wuhan         - SARS found in bats         - COVID-19 reported in Spain     ...

Covid-19 Genome analysis

Covid-19 Genome analysis In [2]: import pandas import numpy as np import sklearn.cluster import distance In [3]: genomes_df = pandas . read_csv ( "/Users/johncalvo/Downloads/covid_sequences.csv" ) genomes_df . head () Out[3]: Virus version Nucleotides sequence 0 >MT198652 |Severe acute respiratory syndrome c... AGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTC... 1 >MT198653 |Severe acute respiratory syndrome c... AGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTC... 2 >MT192758 |Severe acute respiratory syndrome c... CCGCAATCCTGCTAACAATGCTGCAATCGTGCTACAACTTCCTCAA... 3 >MT186679 |Severe acute respiratory syndrome c... CGCGATCAAAACAACGTCGGCCCCAAGGTTTACCCAATAATACTGC... 4 ...