How to use MSA and Clustal for python inside a Jupyter notebook?

Question

I have a FASTA file with sequences associated with states and their cites. Is it possible to use python through Jupyter notebook to run a MSA and clustal, then create a phylogenetic tree with the align sequence. I am not sure where to start and there was no clear direction when I was given the assignment.

Hello! Welcome to Stack Overflow. Please make sure that the question clearly describes your problem. Ideally, the question should be about a single technical issue, illustrated with a code example. In this case, you could provide the contents of a simple FASTA file, show how MSA would look if performed on your FASTA file (step 1), and also the end result (step 2). — afaf12, Dec 18 '21 at 21:47
You should also specify what kind of MSA algorithm you want to use. — afaf12, Dec 18 '21 at 21:59
Please provide enough code so others can better understand or reproduce the problem. — Community, Dec 25 '21 at 22:45

afaf12 · Answer 1 · 2021-12-18T21:49:52.363

Disclaimer: I have no background in biology.

As far as I understand, the FASTA format contains a sequence of letters and aligning means finding if sequence #1 contains or partially overlaps with sequence #2. That's string manipulation, which Python is very good at. You need to write a function that takes 2 strings and returns what you need.

I found a library on Github, which seems to do this, I don't know if using it is permitted in your case. The following code fragment is taken from the documentation. https://github.com/benchling/clustalo-python

from clustalo import clustalo
input = {
    'seq1': 'AAATCGGAAA',
    'seq2': 'CGGA'
}
aligned = clustalo(input)
# aligned is a dict of aligned sequences:
#   seq1: AAATCGGAAA
#   seq2: ----CGGA--

Once you can estimate sequence similarities, you can display them in order.

You can draw inside a Jupyter notebook, an example can be seen here: Using Turtle in Google Colab. Or you could display the tree in text format, using spaces, tabs, etc. to format the tree.

How to use MSA and Clustal for python inside a Jupyter notebook?

1 Answers1