1

I have a long data column that I would like to organize by groups.

np.random.seed(0)
data = {'unit':['a']*5+['b']*5,\
    'day':list(range(5))+list(range(5)),\
    'data':np.random.random(10)*10}
df = pd.DataFrame.from_dict(data)

The dataframe looks like this:

  unit  day      data
0    a    0  5.488135
1    a    1  7.151894
2    a    2  6.027634
3    a    3  5.448832
4    a    4  4.236548
5    b    0  6.458941
6    b    1  4.375872
7    b    2  8.917730
8    b    3  9.636628
9    b    4  3.834415

I would like to arrange it into a table by day:

  day    data a    data b
0   0  5.488135  6.458941
1   1  7.151894  4.375872
2   2  6.027634  8.917730
3   3  5.448832  9.636628
4   4  4.236548  3.834415

The goal is to facilitate plotting of "data a" and "data b" by day (pd.DataFrame.to_clipboard and paste into Excel).

stephenb
  • 1,062
  • 1
  • 12
  • 13

2 Answers2

0

pivot

Pandas has a method for this:

res = df.pivot(index='day', columns='unit')

print(res)

          data          
unit         a         b
day                     
0     5.488135  6.458941
1     7.151894  4.375872
2     6.027634  8.917730
3     5.448832  9.636628
4     4.236548  3.834415

You have a little more work to elevate your index to a series and flatten hierarchical columns, but the data is structured as you desire.

jpp
  • 159,742
  • 34
  • 281
  • 339
0

Use Pivot_table

df=df.pivot_table(index=['day'], columns='unit', values='data').reset_index().rename_axis(None, 1)
df.rename(columns={'a':'data a','b':'data b'})

df    
   day    data a    data b
0    0  5.488135  6.458941
1    1  7.151894  4.375872
2    2  6.027634  8.917730
3    3  5.448832  9.636628
4    4  4.236548  3.834415
cph_sto
  • 7,189
  • 12
  • 42
  • 78