0

I have been using Biopython to align some amino acid sequences with Clustal-Omega, then import the tree generated.

from Bio.Align.Applications import ClustalOmegaCommandline
from Bio import AlignIO
from Bio import Phylo

clustalomega_cline = ClustalOmegaCommandline('/path/to/clustalo', infile=in_file, \
    outfile=out_file, log = log_file, guidetree_out = guidetree_file, verbose=True, \
    auto=True, force=True)
clustalomega_cline()
align = AlignIO.read(out_file, "fasta")
tree = Phylo.read(guidetree_file, "newick")
Phylo.draw(tree)

print [record.id for record in align if record.id  not in \
        [terminal.name for terminal in tree.get_terminals()]]

>['CTX-M-3', 'CTX-M-4', 'CTX-M-5', 'CTX-M-11', 'CTX-M-15', 'CTX-M-133']

print [terminal.name for terminal in tree.get_terminals() if \
        terminal.name == None]

>[None, None, None, None, None, None]

So the imported tree now has some leaves/terminals named None, and is missing an equivalent number of named leaves.

I tried looking in the file at the tree (as formatted by clustalo) and noticed that the genes which are being renamed none always had -0 after them eg:

,
(
(
CTX-M-4:-0
,
CTX-M-5:-0
):0.00171644
,
CTX-M-76:0.00171644
):0.00432852

What do the -0s mean, and how do I fix this so that all my terminals are named?

As a side note, it doesn't seem to be happening when I fill my fasta files with DNA sequences instead to align, and import that tree.

jargogler
  • 1
  • 2
  • You print the termina.name of those which are None if you remove `if terminal.name == None` you'll get all names, or you could do `if terminal.name != None` – llrs Apr 16 '15 at 13:26
  • The point was that there shouldn't be any unnamed terminals. Clustalo has (via python) created an alignment fasta file and a newick tree from the same input fasta of sequences. All the terminal nodes in the newick file are named. Those print statements show that when i've imported these with biopython, 6 of the terminal nodes have been renamed None in the tree. Finding and replacing -0 with 0 in the newick file seems to fix it. Annoying and odd, but oh well! – jargogler Apr 17 '15 at 14:28
  • Then did you check the biopython code? If I can I will try, but I don't promise anything... :( – llrs Apr 17 '15 at 14:33

0 Answers0