First, please read the answer to How to make good reproducible pandas examples. In the future please provide a minimal reproducibl example.
Secondly, Michael G is absolutely correct that phylip is a format that is very peculiar about its syntax.
The code below will alow you to generate a phylogenetic tree from your Pandas dataframe.
First some imports and let's recreate your dataframe.
import pandas as pd
from Bio import Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator
from Bio.Phylo.TreeConstruction import DistanceTreeConstructor
from Bio import AlignIO
data = {'ESV' : ['esv1', 'esv2', 'esv3'],
'trunc': ['TACGTAGGTGGCGAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGAAACTTGAGTGCAGAAGAGGAAAGCGGAATTCCACGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGGCTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGG',
'TACGGAGGGTGCAAGCGTTATCCGGATTCACTGGGTTTAAAGGGTGCGTAGGTGGGTTGGTAAGTCAGTGGTGAAATCTCCGAGCTTAACTTGGAAACTGCCATTGATACTATTAATCTTGAATATTGTGGAGGTTAGCGGAATATGTCATGTAGCGGTGAAATGCTTAGAGATGACATAGAACACCAATTGCGAAGGCAGCTGGCTACACATATATTGACACTGAGGCACGAAAGCGTGGGGATCAAACAGG',
'TACGGGGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGCGCGTAGGCGGCCAGACCAAGTCGAGTGTGAAATTGCAGGGCTTAACTTTGCAGGGTCGCTCGATACTGGTCGGCTAGAGTGTGGAAGAGGGTACTGGAATTCCCGGTGTAGCGGTGAAATGCGTAGATATCGGGAGGAACACCAGCGGCGAAGGCGGGTACCTGGGCCAACACTGACGCTGAGGCGCGAAAGCTAGGGGAGCAAACAG']
}
ndf = pd.DataFrame.from_dict(data)
print(ndf)
Output:
ESV trunc
0 esv1 TACGTAGGTGGCGAGCGTTATCCGGAATTATTGGGCGTAAAGCGCG...
1 esv2 TACGGAGGGTGCAAGCGTTATCCGGATTCACTGGGTTTAAAGGGTG...
2 esv3 TACGGGGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGCG...
Next, write the phylip file in the correct format.
with open("test.phy", 'w') as f:
f.write("{:10} {}\n".format(ndf.shape[0], ndf.trunc.str.len()[0]))
for row in ndf.iterrows():
f.write("{:10} {}\n".format(*row[1].to_list()))
Ouput of test.phy
:
3 253
esv1 TACGTAGGTGGCGAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGAAACTTGAGTGCAGAAGAGGAAAGCGGAATTCCACGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGGCTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGG
esv2 TACGGAGGGTGCAAGCGTTATCCGGATTCACTGGGTTTAAAGGGTGCGTAGGTGGGTTGGTAAGTCAGTGGTGAAATCTCCGAGCTTAACTTGGAAACTGCCATTGATACTATTAATCTTGAATATTGTGGAGGTTAGCGGAATATGTCATGTAGCGGTGAAATGCTTAGAGATGACATAGAACACCAATTGCGAAGGCAGCTGGCTACACATATATTGACACTGAGGCACGAAAGCGTGGGGATCAAACAGG
esv3 TACGGGGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGCGCGTAGGCGGCCAGACCAAGTCGAGTGTGAAATTGCAGGGCTTAACTTTGCAGGGTCGCTCGATACTGGTCGGCTAGAGTGTGGAAGAGGGTACTGGAATTCCCGGTGTAGCGGTGAAATGCGTAGATATCGGGAGGAACACCAGCGGCGAAGGCGGGTACCTGGGCCAACACTGACGCTGAGGCGCGAAAGCTAGGGGAGCAAACAG
Now we can start with the creation of our phylogenetic tree.
# Read the sequences and align
aln = AlignIO.read('test.phy', 'phylip')
print(aln)
Output:
SingleLetterAlphabet() alignment with 3 rows and 253 columns
TACGTAGGTGGCGAGCGTTATCCGGAATTATTGGGCGTAAAGCG...AGG esv1
TACGGAGGGTGCAAGCGTTATCCGGATTCACTGGGTTTAAAGGG...AGG esv2
TACGGGGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGG...CAG esv3
Calculate the distance matrix:
calculator = DistanceCalculator('identity')
dm = calculator.get_distance(aln)
print(dm)
Output:
esv1 0
esv2 0.3003952569169961 0
esv3 0.6086956521739131 0.6245059288537549 0
Construct the phylogenetic tree using UPGMA algorithm and draw the tree in ascii
constructor = DistanceTreeConstructor()
tree = constructor.upgma(dm)
Phylo.draw_ascii(tree)
Output:
________________________________________________________________________ esv3
_|
| ___________________________________ esv2
|____________________________________|
|___________________________________ esv1
Or make a nice plot of the tree:
Phylo.draw(tree)
Output:
