2

I have a pandas Data Frame consisting of 2000 rows x 8 columns. I want to be able to group the first 4 columns together, as well as the other 4, but I can't figure out how. The purpose is to create a categorical bar plot, with colors assigned according to C1=C5, C2=C6, and so forth.

My Data Frame:

In[1]: df.head(5)
Out[1]: 

    C1  C2  C3  C4  C5  C6  C7  C8
0   15  37  17  10  8   11  19  86
1   39  84  11  5   5   13  9   11
2   10  20  30  51  74  62  56  58
3   88  2   1   3   9   6   0   17
4   17  17  32  24  91  45  63  48

Do you suggest adding another column such as df['Gr'] or what else?

FaCoffee
  • 7,609
  • 28
  • 99
  • 174

2 Answers2

3

You can use MultiIndex.from_arrays:

df.columns = pd.MultiIndex.from_arrays([['a'] * 4 + ['b'] * 4 , df.columns])
print (df)
    a               b            
   C1  C2  C3  C4  C5  C6  C7  C8
0  15  37  17  10   8  11  19  86
1  39  84  11   5   5  13   9  11
2  10  20  30  51  74  62  56  58
3  88   2   1   3   9   6   0  17
4  17  17  32  24  91  45  63  48

Then you can use xs and DataFrame.plot.bar:

import matplotlib.pyplot as plt

f, a = plt.subplots(2,1)
df.xs('a', axis=1).plot.bar(ax=a[0])
df.xs('b', axis=1).plot.bar(ax=a[1])
plt.show()

graph


import matplotlib.pyplot as plt

df.columns = pd.MultiIndex.from_arrays([['a'] * 4 + ['b'] * 4 , df.columns])
df.stack(0).T.plot.bar(rot='0', legend=False)

df.columns = ['a'] * 4 + ['b'] * 4
df = df.T.plot.bar(rot='0')

plt.show()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Ok, and how would you create the bar plot? – FaCoffee Sep 29 '16 at 09:26
  • I add solution, please check it. – jezrael Sep 29 '16 at 10:11
  • Thanks! What if the intended outcome was a categorical barplot with four bars on one side (xlabel=a) and four on the other (xlabel=b), instead of 5 categories? – FaCoffee Sep 29 '16 at 14:38
  • 1
    ok, but `matplotlib` is not so easy for me, so give me a time – jezrael Sep 29 '16 at 14:53
  • Sorry. Please take all the time you need. – FaCoffee Sep 29 '16 at 14:56
  • I am not sure if I understand well. So problem is you need one graph, not 2? Or you need change `x-axis` to `c1-c8`, `y-axis no change` and legend to `0-4` ? Can you explain more? Thank you. – jezrael Sep 29 '16 at 15:14
  • Yes I need one graph. Two categories (A and b), and four columns for each category. – FaCoffee Sep 29 '16 at 15:18
  • So you need `df.T.plot.bar()`, but nicer labels in `axis-x` ? – jezrael Sep 29 '16 at 15:23
  • 1
    ok I have 2 solution for you (I am not matplotlib expert, so sorry). Please check it and if need change something, I try tomorow (or post another question with tag `matplotlib`). Now I have to go home. – jezrael Sep 29 '16 at 15:35
  • Hi, here is a more detailed question on the subject: http://stackoverflow.com/questions/39774826/pandas-how-to-draw-a-bar-plot-with-two-categories-and-four-series-each – FaCoffee Sep 29 '16 at 16:04
1

use pd.concat

pd.concat([df.iloc[:, :4], df.iloc[:, 4:]], axis=1, keys=['first4', 'second4'])

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624