0

I have a pandas dataframe, df1, that looks like this:

Sample_names   esv0   esv1   esv2  esv3   ...    esv918  esv919  esv920  esv921

pr1gluc8NH1     635      0   6222     0   ...         0       0       0       0
pr1gluc8NH2    3189     75   9045     0   ...         0       0       0       0
pr1gluc8NHCR1     0   2152  12217     0   ...         0       0       0       0
pr1gluc8NHCR2     0  17411   1315     0   ...         0       1       0       0
pr1sdm8NH1      365      7   4117    32   ...         0       0       0       0
pr1sdm8NH2     4657     18  13520     0   ...         0       0       0       0
pr1sdm8NHCR1      0    139   3451     0   ...         0       0       0       0
pr1sdm8NHCR2   1130   1439   4163     0   ...         0       0       0       0

As you can see there are many zero values. I want to plot a stacked bar graph:

df1.plot(kind='bar',stacked=True)

This works fine and gives the right bar graph. But the legend is huge because it creates a legend for all the 922 values. There are only about 40-50 non-zero values for each Sample_names; so in principle the legend can be smaller. Is there any way to make it print the legend for only the non-zero values? I would appreciate any help.

Note: If it helps, I have created a dictionary where each element is a dataframe of one sample_names and its non-zero columns. For example, my dictionary v has 8 elements, each of which is a dataframe. v[0] looks like

Index       pr1gluc8NH1
esv2          6222
esv9          4879
esv27         2050

and so on (it has 43 non-zero rows).

v[1] is the same way, but for the next sample. I could also use this dictionary to make the plots if it's possible.

Kaumz
  • 51
  • 8
  • https://stackoverflow.com/a/35710894/8560382 – chrisckwong821 Mar 15 '19 at 08:16
  • Thanks. The `label='_nolegend_'` didn't work. (I just included it like this: `df1.plot(kind='bar',stacked=True, label='_nolegend_')` ) I'm trying to figure out how the other answer, with the `for` loop could be used for my data frame. – Kaumz Mar 15 '19 at 16:04

1 Answers1

1

I've run into a similar problem creating a stacked bar plot and only labelling some of the data in order to avoid a huge legend. I figured it out using subplots and manually setting the label in a for loop based on a condition, taking inspiration from here and here.

I recreated a smaller version of your dataframe:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

data = [['pr1gluc8NH1',635,0,6222,0,23,3,543],['pr1gluc8NH2',3189,75,9045,0,55,5,66],['pr1gluc8NHCR1',0,2152,12217,0,43,67,43],['Bpr1gluc8NHCR2',0,17411,1315,0,889,56,0],['pr1sdm8NH1',365,7,4117,32,765,34,0]]
df = pd.DataFrame(data, columns = ['Sample_names','esv0','esv1','esv2','esv3','esv4','esv5','esv6'])
df = df.set_index('Sample_names')

print(df)
                esv0   esv1   esv2  esv3  esv4  esv5  esv6
Sample_names                                              
pr1gluc8NH1      635      0   6222     0    23     3   543
pr1gluc8NH2     3189     75   9045     0    55     5    66
pr1gluc8NHCR1      0   2152  12217     0    43    67    43
Bpr1gluc8NHCR2     0  17411   1315     0   889    56     0
pr1sdm8NH1       365      7   4117    32   765    34     0

Then I iterate over the dataframe and plot values from each column as a bar plot stacked on top of previous bars. Only columns without a zero value get a label added to the legend.

mycolors = sns.color_palette(n_colors=10) #set color palette with seaborn colors
f, ax1 = plt.subplots()
bot_array = [] #need this for representing values already plotted
labels = df.columns.values.tolist() #get labels from column names

#Create base bar separately because it doesn't require 'bottom' value
col = df.iloc[:,0].to_list()
if float(0) in col:
  ax1.bar(range(len(col)), col, label="", color=mycolors[0])
  bot_array = np.array(col)
else:
  ax1.bar(range(len(col)), col, label=labels[0], color=mycolors[0])
  bot_array = np.array(col)

#Loop over dataframe and add each column as a new colored bar with label only if there are no zero values in that column
for i in range(1,len(df.columns)):
  cur_color = mycolors[i]
  col = df.iloc[:,i].to_list()
  if float(0) in col:
    ax1.bar(range(len(col)), col, bottom=bot_array, label="", color=mycolors[i])
    bot_array = np.array(col)+bot_array
  else:
    ax1.bar(range(len(col)), col, bottom=bot_array, label=labels[i], color=mycolors[i])
    bot_array = np.array(col)+bot_array
ax1.set_ylabel('Count')
plt.legend(loc='upper right')

Stacked bar plot with limited legend

Cam
  • 13
  • 4