81

I am trying to create a stacked bar graph with pandas that replicates the picture, all my data is separate from that excel spreadsheet.

enter image description here

I can't figure out how to make a dataframe for it like pictured, nor can I figure out how to make the stacked bar chart. All examples I locate work in different ways to what I'm trying to create.

My dataframe is a csv of all values narrowed down to the following with a pandas dataframe.

      Site Name    Abuse/NFF
0    NORTH ACTON       ABUSE
1    WASHINGTON         -
2    WASHINGTON        NFF
3    BELFAST            -
4    CROYDON            - 

I have managed to count the data with totals and get individual counts for each site, I just cant seem to combine it in a way to graph.

Would really appreciate some strong guidance.

Completed code, many thanks for the assistance completing.

test5 = faultdf.groupby(['Site Name', 'Abuse/NFF'])['Site Name'].count().unstack('Abuse/NFF').fillna(0)

test5.plot(kind='bar', stacked=True)
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Kuzen
  • 950
  • 3
  • 10
  • 15
  • 1
    Note to readers: If you are getting the `KeyError` related to index when trying the accepted answer, use the completed code here in the question. – KobeJohn Dec 21 '16 at 05:04

5 Answers5

140

Are you getting errors, or just not sure where to start?

%pylab inline
import pandas as pd
import matplotlib.pyplot as plt

df2 = df.groupby(['Name', 'Abuse/NFF'])['Name'].count().unstack('Abuse/NFF').fillna(0)
df2[['abuse','nff']].plot(kind='bar', stacked=True)

stacked bar plot

naught101
  • 18,687
  • 19
  • 90
  • 138
chucklukowski
  • 1,996
  • 2
  • 13
  • 13
  • 1
    That produces this http://i.imgur.com/hocPgWg.jpg which is not quite right, i need the stacked part to be the count of the abuse/nff column for each site. I'm not getting errors, i just struggling to get started. Cheers for the response. – Kuzen May 02 '14 at 15:25
  • I've updated my answer to include the ['Abuse/NFF'] part after the groupby function. Adding this means that the Abuse column will be the only value that is aggregated (counted in this example). – chucklukowski May 02 '14 at 15:43
  • 1
    Not working sadly, its basically the same graph now but without being stacked, no errors, no legend, no green basically. Its counting the totals rather than the totals of the values in the columns per store, if that makes sense. – Kuzen May 02 '14 at 15:53
  • Another try. If you want to see the blanks, change the beginning of the last line to... df2.plot( – chucklukowski May 02 '14 at 18:39
  • Cheers for another bash, but still no joy. Will put code on my question above , getting error. KeyError: "['ABUSE' 'NFF' '-'] not in index" i have made adjustments to code so they match my dataframe, but cant seem to get it to work, also i want - in results, i need to change - to mean faulty, just not got around to it. – Kuzen May 02 '14 at 19:06
  • First line of code was a suitable dataframe alone once edits were made, many thanks for the assistance. Added the final code to question if anyone else needs it. – Kuzen May 02 '14 at 20:55
48

That should help

df.groupby(['NFF', 'ABUSE']).size().unstack().plot(kind='bar', stacked=True)
ahajib
  • 12,838
  • 29
  • 79
  • 120
Kyofa
  • 548
  • 4
  • 16
4

Maybe you can use pandas crosstab function

test5 = pd.crosstab(index=faultdf['Site Name'], columns=faultdf['Abuse/NFF'])

test5.plot(kind='bar', stacked=True)
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
3

If you want to change the size of plot the use arg figsize

df.groupby(['NFF', 'ABUSE']).size().unstack()
      .plot(kind='bar', stacked=True, figsize=(15, 5))
kamran kausar
  • 4,117
  • 1
  • 23
  • 17
1
from matplotlib import cm
cmap = cm.get_cmap('Spectral') # Colour map (there are many others)

df.plot(kind='bar', stacked=True, figsize=(20, 10), cmap=cmap, edgecolor='None')
plt.show()

This will also avoid duplicate colors in the legend of your bar chart.

Toey
  • 79
  • 5