3

Here is my datasets:

df
    A    B      C
0  13  Yes  False
1  12   No   True
2   2  Yes   True
3  12   No  False
4   4   No   True
5   1  Yes   True
6   1   No  False
7   5   No   True
8  15  Yes  False

and

df2
    A    B      C
0  13  Yes  False
1  12   No  False
2  11   No  False
3  15  Yes  False
4  12   No  False
5  21  Yes  False

Here is the piece of problematic code:

fig, ax = plt.subplots(2,1, sharey="all", sharex="all")
df2.boxplot("A", by=["B","C"], ax=ax[0])
df.boxplot("A", by=["B","C"], ax=ax[1])

which gives boxplot

The problem I have, is that, in the upper subplot, the boxplot on the right hand side should be shifted of 1 to right to align with the (Yes, False) label.

I think it comes from the fact that sharex doesn't care much about the xticklabels, but rather the xticks values (which are [1,2] and [1,2,3,4]). I can fix this with the positions=[1,3] argument in df2.boxplot.

The question is, how can I fix this without that prior knowledge of which groups won't be represented ?

Also, could this be a Pandas or Matplotlib bug, or this behavior is expected for certain reason ?

jrjc
  • 21,103
  • 9
  • 64
  • 78
  • Did you try the answers from here: http://stackoverflow.com/questions/25284859/grouping-boxplots-in-seaborn-when-input-is-a-dataframe ? – cphlewis Apr 03 '15 at 21:13

1 Answers1

2
import seaborn as sns
import pandas as pd
df = pd.DataFrame([[13, 'Yes', False],
       [12, 'No', True],
       [2, 'Yes', True],
       [12, 'No', False],
       [4, 'No', True],
       [1, 'Yes', True],
       [1, 'No', False],
       [5, 'No', True],
       [15, 'Yes', False]],
       columns = list('ABC'))
df2 = pd.DataFrame([[13, 'Yes', False],
       [12, 'No', False],
       [11, 'No', False],
       [15, 'Yes', False],
       [12, 'No', False],
       [21, 'Yes', False]],
       columns = list('ABC'))
df['i'] = 1
df2['i'] = 2
dfb = pd.concat([df,df2])
dfb['B,C'] = map(lambda x,y: '%s, %s'%(str(x),str(y)), dfb.B, dfb.C)
dfb2 = pd.DataFrame(dfb, columns=['A','i','B,C'])
sns.factorplot('B,C', row='i',y='A', kind='box', data=dfb2)

enter image description here

Added an identifier i to each dataframe to distinguish them once concatenated, combined existing variables B, C so I could pass them as the x-argument to factorplot. That was trying to reproduce your figure. Letting factorplot do a bit more of it:

dfc = pd.concat([df,df2])
sns.factorplot('B', row='i', col='C', y='A', kind='box', data=dfc)

enter image description here

That certainly makes clear which case doesn't have any data!

cphlewis
  • 15,759
  • 4
  • 46
  • 55