I'm trying to customize a phylogenetic tree based on a tree file and a dataframe. The tree file has the same data in terms of ID, for example, GCA_021406745.1_ASM2140674v1 is in this file and in the data frame. Dataframe looks like this:
GCA_000375645.1_ASM37564v1 20
GCA_900543265.1_UMGS547 20
GCA_000614355.1_ASM61435v1 7
GCA_000766005.1_ASM76600v1 7
Where the second column is the cluster value. This value is important because I want to use this value to customize the labels of my phylogenetic tree, for example, "1" = red, "2" = green, and so on. To do that, I'm using a python program for phylogenetic tree manipulation: Toytree https://toytree.readthedocs.io/en/latest/index.html
Specifically, I'm using tip_labels_colors
to customize the labels. For example, with this example (https://toytree.readthedocs.io/en/latest/8-styling.html#Node-labels-styling) you can do that task by making a list of hex color values based on tip labels:
colorlist = ["#d6557c" if "rex" in tip else "#5384a3" for tip in rtre.get_tip_labels()]
rtre.draw(
tip_labels_align=True,
tip_labels_colors=colorlist
);
That if statement is based on the condition if "rex" is in the label. Now, I want to do the same based on my data frame, but using the cluster value. I'm thinking of doing the same color_list
but with a color for each cluster value.
I have not been able to do that successfully, so I need some help with maybe an idea or pseudocode.
Here is a minimal example, using data from toytree:
import toytree
import toyplot
import numpy as np
# a tree to use for examples
url = "https://eaton-lab.org/data/Cyathophora.tre"
rtre = toytree.tree(url).root(wildcard='prz')
Using these lines, you can customize the labels of the tree with two different colors.
# make list of hex color values based on tip labels
colorlist = ["#d6557c" if "rex" in tip else "#5384a3" for tip in rtre.get_tip_labels()]
rtre.draw(
tip_labels_align=True,
tip_labels_colors=colorlist
);
The example used the condition "rex" in the label to color the label with a specific color. Well, I need help with that because I need to color my labels based on my data frame values (cluster values).