0

I have this kind of data (data.txt), a file in tab delimited text ):

#genera        data1    data2
Crocinitomix    0.000103252 0
Fluviicola      2.58E-05    0
uncultured      0.000180692 0.000103252
Actibacter      2.58E-05    0
Aquibacter      0.0003  0.002503872
Litoribaculum   0.000516262 0.1
Lutibacter      2.58E-05    0
Lutimonas       5.16E-05    0.00001
Ulvibacter      0   0
uncultured      0.00240062  0
Bacteroidetes bacterium 5.16E-05    2.58E-05
bacterium       0.000129066 0

And I want to create a bar chart plot, one like the picture (an example taken from other page) bar chart plot

In this case I have two samples (data1 and data2), but could be many, it could be hundred or thousands of taxa (genera) and will be difficult to choose one by one color, so the color of each taxa must be auto assigned. Any one have a python script, that load a txt file with this format and plot it ??

Sorry if I don't put any code, I don't know how to code in python, I have tried with QIIME, but I have to eliminate a lot of text (example: D_0__Bacteria;D_1__Bacteroidetes;D_2__Flavobacteriia;D_3__Flavobacteriales;D_4__Cryomorphaceae;D_5__Fluviicola) so I made a perl script to extract just the genera (D_5__), and now, I just need to plot it !!!

Thanks So Much !!!

abraham
  • 661
  • 8
  • 14

1 Answers1

2

There are many ways to solve this problem, here is a solution using pandasand bokeh:

import pandas as pd
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.core.properties import value
from bokeh.palettes import Spectral
import itertools  

output_file("stacked.html")

df = pd.read_csv('bacteria.txt', sep='\t')
df.set_index('#genera', inplace=True)

samples = df.columns.values
organisms = df.index.values

# You have two rows with 'uncultured' data. I added these together.
# This may or may not be what you want.
df = df.groupby('#genera')[samples].transform('sum')

# create a color iterator
# See https://stackoverflow.com/q/39839409/50065
# choose an appropriate pallete from
# https://docs.bokeh.org/en/latest/docs/reference/palettes.html
# if you have a large number of organisms
color_iter = itertools.cycle(Spectral[11])    
colors = [next(color_iter) for organism in organisms]

# create a ColumnDataSource
data = {'samples': list(samples)}
for organism in organisms:
    data[organism] = list(df.loc[organism])
source = ColumnDataSource(data=data)

# create our plot
p = figure(x_range=samples, plot_height=250, title="Species abundance",
           toolbar_location=None, tools="")

p.vbar_stack(organisms, x='samples', width=0.9, source=source,
             legend=[value(x) for x in organisms], color=colors)

p.xaxis.axis_label = 'Sample'
p.yaxis.axis_label = 'Value'
p.legend.location = "top_right"
p.legend.orientation = "vertical"

# Position the legend outside the plot area
# https://stackoverflow.com/questions/48240867/how-can-i-make-legend-outside-plot-area-with-stacked-bar
new_legend = p.legend[0]
p.legend[0].plot = None
p.add_layout(new_legend, 'right')

show(p)

This creates:

Species abundance plot

bigreddot
  • 33,642
  • 5
  • 69
  • 122
BioGeek
  • 21,897
  • 23
  • 83
  • 145
  • So sorry to don't answer before, but I had a problems with my count, Thanks so much.... I tried to run the script with python3, but it did not generate the stacked.html file (I think it is the output file), the script did not gives any warning (I think all the requirements are ok). The file is tab delimited file, each column is delimited by a tab, do I have to give the output file or it will generate the html file.... Thanks so Much !!! – abraham Mar 21 '18 at 15:36