Just a little new to Python, Pandas and matplotlib/ seaborn, so please be a little patient.
I have a dataframe with 65k rows
I am trying to plot this in a stacked bar chart
I have used these intial settings (without them it looks worse than it does with them) I have tried cutting them out one at a time to see if I can do with the seaborn to make my troubleshooting easier, but it seems like I really do need them all to make this display even ½ way decent
#required libraries:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
#for some improved visuals
import seaborn as sns
from pylab import rcParams
#this line enables the plots to be embedded into the notebook
%matplotlib inline
# Set some options as I have been used to having them
pd.set_option('html', True)
pd.set_option('max_columns', 40)
pd.set_option('max_rows', 20)
sns.set(style="ticks")
matplotlib.style.use('ggplot')
rcParams['figure.figsize'] = 15, 10
rcParams['font.size'] = 20
rcParams['axes.facecolor'] = 'white'
My Code to group the data looks like this:
HouseholdIncomeVSOccupation = workingdata.groupby(['house INCOME'
, 'OCCUPATION_M'])['house INCOME'].count().unstack('OCCUPATION_M')
My code to plot the chart looks like this:
colors = ['#0066CC', '#33FF33', '#FF99CC', '#FDEBD0', '#CC9933'
, '#FF0000', 'black', '#3333FF', 'grey', '#CC66FF'
, '#339900','#FF3399','#FFFF66','#990000']
HouseholdIncomeVSOccupation.plot(kind='bar', stacked=True, color=colors)
I wanted to add a title
plt.title('Household Income VS Occupation')
I don't quite understand why my plotting code is making it a subplot, I do realise it is because it is a subplot I have got the empty plot above with the title on it.
I would like to format a couple of things:
- I really just wanted the title to show on my first chart - so if someone can tell me how to set a title in the subplot rather than for the chart as a whole that would work OK. A little better is to tell me how to make this the main chart and not be a subplot.
- I would like to move the legend outside the chart, ideally positioned below the chart space with the entries laid out left to right and on a couple of lines (kind of like Excel does it). This might be too hard (as in too many bits of code to make it work well), in which case putting the legend outside the chart on the right hand side would be the next best option (hopefully this is a simple 1 liner).
- I would like to name each of the Axis
I really appreciate how helpfull the community on here is, and I'm quite enjoying my journey of discovery for Python. I just need to get some of these to work more quickly than learning it the way I've been doing so far. Certainly loving being able to quickly and easily work with dataframes in the millions of rows.
EDIT: Here is the working code after getting ImportanceOfBeingErnest answer
colors = ['#0066CC', '#33FF33', '#FF99CC', '#FDEBD0', '#CC9933',
'#FF0000', 'black', '#3333FF', 'grey', '#CC66FF', '#339900',
'#FF3399','#FFFF66','#990000']
ax = HouseholdIncomeVSOccupation.plot(kind='bar', stacked=True, color=colors)
ax.set_title('Household Income VS Occupation')
ax.set_xlabel('Household Income')
ax.set_ylabel('Count')
plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)
I followed the link to his other excellent answer on how to format legends. That gave the last line of working code...