I have a FASTA file with sequences associated with states and their cites. Is it possible to use python through Jupyter notebook to run a MSA and clustal, then create a phylogenetic tree with the align sequence. I am not sure where to start and there was no clear direction when I was given the assignment.
-
Hello! Welcome to Stack Overflow. Please make sure that the question clearly describes your problem. Ideally, the question should be about a single technical issue, illustrated with a code example. In this case, you could provide the contents of a simple FASTA file, show how MSA would look if performed on your FASTA file (step 1), and also the end result (step 2). – afaf12 Dec 18 '21 at 21:47
-
You should also specify what kind of MSA algorithm you want to use. – afaf12 Dec 18 '21 at 21:59
-
Please provide enough code so others can better understand or reproduce the problem. – Community Dec 25 '21 at 22:45
1 Answers
Disclaimer: I have no background in biology.
As far as I understand, the FASTA format contains a sequence of letters and aligning means finding if sequence #1 contains or partially overlaps with sequence #2. That's string manipulation, which Python is very good at. You need to write a function that takes 2 strings and returns what you need.
I found a library on Github, which seems to do this, I don't know if using it is permitted in your case. The following code fragment is taken from the documentation. https://github.com/benchling/clustalo-python
from clustalo import clustalo
input = {
'seq1': 'AAATCGGAAA',
'seq2': 'CGGA'
}
aligned = clustalo(input)
# aligned is a dict of aligned sequences:
# seq1: AAATCGGAAA
# seq2: ----CGGA--
Once you can estimate sequence similarities, you can display them in order.
You can draw inside a Jupyter notebook, an example can be seen here: Using Turtle in Google Colab. Or you could display the tree in text format, using spaces, tabs, etc. to format the tree.

- 5,163
- 9
- 35
- 58