0

I've managed to plot subplots from a groupby. I have two columns 'A', and 'B', which I want to plot on subplot (1 per value in 'B') with their respective averages. I prepare my data by counting, dropping the duplicates, and then summing it up (if there is a more elegant way to do it, please let me know!).

df = pd.DataFrame([[1, 'cat1'], [1, 'cat1'], [4, 'cat2'], [3, 'cat1'], [5, 'cat1'],[1, 'cat2']], columns=['A', 'B'])
df = df[['A','B']]
df['count'] = df.groupby(['A','B'])['A'].transform('count')
df = df.drop_duplicates(['A','B'])
df = df.groupby(['A','B']).sum()

Then I unstack it and plot it with subplots:

plot = df.unstack().plot(kind='bar',subplots=True, sharex=True, sharey=True, layout = (3,3), legend=False)
plt.show(block=True)

I would like to add the mean for each category, but I have don't know: 1. How to calculate the mean. If I calculate it on the unstacked groupby, I get the mean of the count, rather than the value 'A'. 2. Once I have the mean value, I don't know how to plot it on the same subplot.

Any help is welcomed :)

--

Edit following Nickil Maveli's answer: What I'm trying to achieve is to plot bars of the grouped values on A, and to plot a vertical line with the mean value on B. So using the graphs from Nickil Maveli, this would be:enter image description here

From what I've found on stackexchange, I think I should be using plt.axvline(mean, color='r', linestyle='--'). However, I don't know how to call have a different mean per plot.

Mike Atomat
  • 93
  • 3
  • 11
  • Can you add sample of data? Please check [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – jezrael Aug 24 '16 at 07:41
  • the line `df = df_plot_zoom_cs.drop_duplicates(['A','B'])` completely overwrites `df` and nothing prior to that matters anymore. This is indicative of you not verifying that the code you've posted works. Please see http://stackoverflow.com/help/mcve for guidance on how to post a question. – piRSquared Aug 24 '16 at 07:44
  • @piRSquared - I think it is only typo, original datafarme is called `df_plot_zoom_cs` and OP forget changed it. – jezrael Aug 24 '16 at 07:46
  • @jezrael I should be nicer ;-) – piRSquared Aug 24 '16 at 07:46
  • oops, yes, will change that! – Mike Atomat Aug 24 '16 at 08:00
  • Thank you. can you add mean column? Because I try aggregate and columns sum and mean are same `df = df.groupby(['A','B']).agg([sum, 'mean'])` – jezrael Aug 24 '16 at 08:30

1 Answers1

0

IIUC, you can use agg on the mean and count to compute averages and counts beforehand.

df_1 = df.groupby(['A', 'B'])['A'].agg({'counts': 'count'}).reset_index()
df_2 = df.groupby('B')['A'].agg({'average': 'mean'}).reset_index()

Followed by DF.merge on column B, as it is the common column in both the groupby operations. Then, the duplicated entries among columns A and B can be removed.

df = df_1.merge(df_2, on='B').drop_duplicates(['A', 'B'])
df.drop('average', axis=1, inplace=True)
df = df.groupby(['A','B']).sum()

Make modifications to the second dataframe to let column A take the mean values.

df_2['A'] = df_2['average']
df_2 = df_2.groupby(['A','B']).sum()

Using Layout and Targetting Multiple Axes.

fig, ax = plt.subplots(2, 2, figsize=(8, 8))

target1 = [ax[0][0], ax[0][1]]
target2 = [ax[1][0], ax[1][1]]

Count groupby plot.

df.unstack().plot(kind='bar', subplots=True, rot=0, xlim=(0,5), ax=target1,
                            ylim=(0,3), layout=(2,2), legend=False)

Mean groupby plot.

df_2.unstack().plot(kind='bar', width=0.005, subplots=True, rot=0, xlim=(0,5), ax=target2,
                    ylim=(0,3), layout=(2,2), legend=False, color='k')

Adjusting the spacing between subplots.

plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.show()

Image

Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
  • Thanks, almost there :) You help me seeing why I was not clear: I want to get the count on 'A' and the 'mean' on 'B', as below: df['count'] = df.groupby(['A','B'])['A'].transform('count') df['mean'] = df.groupby(['A', 'B'])['B'].transform('mean') Both the means for the dummy data are 2.5, so I would like to plot a vertical line on both subplots at 2.5. – Mike Atomat Aug 24 '16 at 09:29
  • @MikeAtomat: Please have a look at my *edited response* and see if that's indeed what you wanted. – Nickil Maveli Aug 24 '16 at 12:45
  • Thanks! I edited my question to put a picture. I'm trying to have a line on the same graph :) – Mike Atomat Aug 24 '16 at 13:04
  • I've got you close to what you need. I think you can take it forward from here :-) – Nickil Maveli Aug 24 '16 at 16:55
  • wow, great, thanks! I managed to put them on the same graph (using the same target), but now both sets of graphs (plotted on the same subplot) use different start and end value for the x axis. They have identical xlim and ylim in the plotting line of code. I also had to set `df_2['B'] = maxcount`, otherwise they were too small :) I tried using `ax[0][0].set_xlim(xmin=0, xmax=14)`. My A values are between 7 and 21, and their average between 11 and 13; now they are plotting but not at the right place. Any clue? :) – Mike Atomat Aug 25 '16 at 08:04
  • So I figured out that the problem is that bar plot expects categories on the x axis.. which means it's not directly possible to do it this way (as far as I know), as it was plotting one per graticule of x axis rather than following the 'scale' of the axis. Still looking for a solution :) – Mike Atomat Aug 25 '16 at 15:16