0

Just a little new to Python, Pandas and matplotlib/ seaborn, so please be a little patient.

I have a dataframe with 65k rows

I am trying to plot this in a stacked bar chart

I have used these intial settings (without them it looks worse than it does with them) I have tried cutting them out one at a time to see if I can do with the seaborn to make my troubleshooting easier, but it seems like I really do need them all to make this display even ½ way decent

#required libraries:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

#for some improved visuals
import seaborn as sns
from pylab import rcParams

#this line enables the plots to be embedded into the notebook
%matplotlib inline

# Set some options as I have been used to having them 
pd.set_option('html', True)
pd.set_option('max_columns', 40)
pd.set_option('max_rows', 20)
sns.set(style="ticks")
matplotlib.style.use('ggplot')
rcParams['figure.figsize'] = 15, 10
rcParams['font.size'] = 20
rcParams['axes.facecolor'] = 'white'

My Code to group the data looks like this:

HouseholdIncomeVSOccupation = workingdata.groupby(['house INCOME'
, 'OCCUPATION_M'])['house INCOME'].count().unstack('OCCUPATION_M')

My code to plot the chart looks like this:

colors = ['#0066CC', '#33FF33', '#FF99CC', '#FDEBD0', '#CC9933'
, '#FF0000', 'black', '#3333FF', 'grey', '#CC66FF'
, '#339900','#FF3399','#FFFF66','#990000']

HouseholdIncomeVSOccupation.plot(kind='bar', stacked=True, color=colors) 

Which gives this result: enter image description here

I wanted to add a title

plt.title('Household Income VS Occupation')

and that gave this result: enter image description here

I don't quite understand why my plotting code is making it a subplot, I do realise it is because it is a subplot I have got the empty plot above with the title on it.

I would like to format a couple of things:

  1. I really just wanted the title to show on my first chart - so if someone can tell me how to set a title in the subplot rather than for the chart as a whole that would work OK. A little better is to tell me how to make this the main chart and not be a subplot.
  2. I would like to move the legend outside the chart, ideally positioned below the chart space with the entries laid out left to right and on a couple of lines (kind of like Excel does it). This might be too hard (as in too many bits of code to make it work well), in which case putting the legend outside the chart on the right hand side would be the next best option (hopefully this is a simple 1 liner).
  3. I would like to name each of the Axis

I really appreciate how helpfull the community on here is, and I'm quite enjoying my journey of discovery for Python. I just need to get some of these to work more quickly than learning it the way I've been doing so far. Certainly loving being able to quickly and easily work with dataframes in the millions of rows.

EDIT: Here is the working code after getting ImportanceOfBeingErnest answer

colors = ['#0066CC', '#33FF33', '#FF99CC', '#FDEBD0', '#CC9933',
'#FF0000', 'black', '#3333FF', 'grey', '#CC66FF', '#339900',
'#FF3399','#FFFF66','#990000']

ax = HouseholdIncomeVSOccupation.plot(kind='bar', stacked=True, color=colors)
ax.set_title('Household Income VS Occupation')
ax.set_xlabel('Household Income')
ax.set_ylabel('Count')
plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

I followed the link to his other excellent answer on how to format legends. That gave the last line of working code...

kiltannen
  • 1,057
  • 2
  • 12
  • 27

2 Answers2

1

A little better is to tell me how to make this the main chart and not be a subplot.

try

import matplotlib.pyplot as plt
plt.figure()

to position your legend use bb_to_anchor argument; where your plot is 0 to 1 X and 0 to 1 Y; anything below chart is negative Y; left of chart negative X; you can add columns w/ ncol; and expand legend w/ mode='expand'

legend outside the chart, ideally positioned below the chart space with the entries laid out left to right and on a couple of lines

plt.legend(bbox_to_anchor=(0., -0.3, 1., -0.4), mode="expand", ncol = 2)

legend outside the chart on the right hand

plt.legend(bbox_to_anchor=(1.05, 1))

I would like to name each of the Axis

plt.xlabel()
plt.ylabel()
litepresence
  • 3,109
  • 1
  • 27
  • 35
  • This does look promising. I tried it and don't think I did it quite right. I've edited my attempt into the question above. Or maybe the plot.figure(0 isn't doing quite what's needed to get rid of the subplot happenning. – kiltannen Mar 24 '18 at 04:04
  • I did try the suggestion for moving the legend as well - but that didn't work either. I figure once we get the heading and axis labels to work the expected way I can get the legend to shift as well. – kiltannen Mar 24 '18 at 04:10
1

You have two options,

  • plot the dataframe to the existing axes. In this case you do not have a axes handle, so you can use the current axes

    df.plot(..., ax=plt.gca())
    
  • Create the plot the first, only afterwards modify it,

    ax = df.plot(...)
    ax.set_title(...)
    ax.set_xlabel(...)
    

For how to get the legend out of the axes, see How to put the legend out of the plot.

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712