0

I have a FASTA file with sequences associated with states and their cites. Is it possible to use python through Jupyter notebook to run a MSA and clustal, then create a phylogenetic tree with the align sequence. I am not sure where to start and there was no clear direction when I was given the assignment.

afaf12
  • 5,163
  • 9
  • 35
  • 58
rt2421
  • 3
  • 2
  • Hello! Welcome to Stack Overflow. Please make sure that the question clearly describes your problem. Ideally, the question should be about a single technical issue, illustrated with a code example. In this case, you could provide the contents of a simple FASTA file, show how MSA would look if performed on your FASTA file (step 1), and also the end result (step 2). – afaf12 Dec 18 '21 at 21:47
  • You should also specify what kind of MSA algorithm you want to use. – afaf12 Dec 18 '21 at 21:59
  • Please provide enough code so others can better understand or reproduce the problem. – Community Dec 25 '21 at 22:45

1 Answers1

0

Disclaimer: I have no background in biology.

As far as I understand, the FASTA format contains a sequence of letters and aligning means finding if sequence #1 contains or partially overlaps with sequence #2. That's string manipulation, which Python is very good at. You need to write a function that takes 2 strings and returns what you need.

I found a library on Github, which seems to do this, I don't know if using it is permitted in your case. The following code fragment is taken from the documentation. https://github.com/benchling/clustalo-python

from clustalo import clustalo
input = {
    'seq1': 'AAATCGGAAA',
    'seq2': 'CGGA'
}
aligned = clustalo(input)
# aligned is a dict of aligned sequences:
#   seq1: AAATCGGAAA
#   seq2: ----CGGA--

Once you can estimate sequence similarities, you can display them in order.

You can draw inside a Jupyter notebook, an example can be seen here: Using Turtle in Google Colab. Or you could display the tree in text format, using spaces, tabs, etc. to format the tree.

afaf12
  • 5,163
  • 9
  • 35
  • 58