I have been using Biopython to align some amino acid sequences with Clustal-Omega, then import the tree generated.
from Bio.Align.Applications import ClustalOmegaCommandline
from Bio import AlignIO
from Bio import Phylo
clustalomega_cline = ClustalOmegaCommandline('/path/to/clustalo', infile=in_file, \
outfile=out_file, log = log_file, guidetree_out = guidetree_file, verbose=True, \
auto=True, force=True)
clustalomega_cline()
align = AlignIO.read(out_file, "fasta")
tree = Phylo.read(guidetree_file, "newick")
Phylo.draw(tree)
print [record.id for record in align if record.id not in \
[terminal.name for terminal in tree.get_terminals()]]
>['CTX-M-3', 'CTX-M-4', 'CTX-M-5', 'CTX-M-11', 'CTX-M-15', 'CTX-M-133']
print [terminal.name for terminal in tree.get_terminals() if \
terminal.name == None]
>[None, None, None, None, None, None]
So the imported tree now has some leaves/terminals named None, and is missing an equivalent number of named leaves.
I tried looking in the file at the tree (as formatted by clustalo) and noticed that the genes which are being renamed none always had -0 after them eg:
,
(
(
CTX-M-4:-0
,
CTX-M-5:-0
):0.00171644
,
CTX-M-76:0.00171644
):0.00432852
What do the -0s mean, and how do I fix this so that all my terminals are named?
As a side note, it doesn't seem to be happening when I fill my fasta files with DNA sequences instead to align, and import that tree.