Using groupby on a Pandas DataFrame to add arbitrary number of columns and calculate values

Question

I have a pandas DataFrame I want to to convert into a time table (for visualization purposes) by using groupby and adding an arbitrary number of columns based on hour time increments, and populating the data from a 3rd column.

The source DataFrame might look like:

ID  Hour Floor          
Jay  2     34       
Jay  3     34   
Tim  0     36  
Tim  1     34
Tim  2     36
Tom  3     32
Tom  4     36
Rob  3     31
Rob  4     32
Rob  5     33
Rob  6     34
...

What I am aiming for is:

ID  HOUR_0 HOUR_1 HOUR_2 HOUR_3 HOUR_4 HOUR_5 HOUR_6...
Jay   0      0      34      34      0      0      0
Tim   36     34     36      0       0      0      0
Tom   0      0      0       32      36     0      0
Rob   0      0      0       31      32     33     34

What I can't get (without manually constructing this using loops) is adding an arbitrary number of columns (after a groupby operation) based on the unique or range of hours in the first DataFrame, and then calculating each column value based on on the Hour and Floor columns from the first DataFrame.

Any ideas?

score 3 · Answer 1 · answered Apr 09 '18 at 17:37

Because I can't help but show how this works with pd.factorize

i, r = pd.factorize(df.ID)
j, c = pd.factorize(df.Hour, sort=True)
b = np.zeros((r.size, c.size), df.Floor.dtype)

b[i, j] = df.Floor.values

d = pd.DataFrame(b, r, [f'Hour_{h}' for h in c])

d

     Hour_0  Hour_1  Hour_2  Hour_3  Hour_4  Hour_5  Hour_6
Jay       0       0      34      34       0       0       0
Tim      36      34      36       0       0       0       0
Tom       0       0       0      32      36       0       0
Rob       0       0       0      31      32      33      34

score 3 · Answer 2 · answered Apr 09 '18 at 17:42

3

Is this simple pivot ?

df.pivot(*df.columns).fillna(0).add_prefix('Hour_')
Out[71]: 
Hour  Hour_0  Hour_1  Hour_2  Hour_3  Hour_4  Hour_5  Hour_6
ID                                                          
Jay      0.0     0.0    34.0    34.0     0.0     0.0     0.0
Rob      0.0     0.0     0.0    31.0    32.0    33.0    34.0
Tim     36.0    34.0    36.0     0.0     0.0     0.0     0.0
Tom      0.0     0.0     0.0    32.0    36.0     0.0     0.0

answered Apr 09 '18 at 17:42

BENY

317,841
20
164
234

Yes, should have thought about it. Unzipping may cause problems if order is different. And might need to add `.astype(int)`. Upvoted – Anton vBR Apr 09 '18 at 17:46
@AntonvBR yep , should adding astype(int) here :-) – BENY Apr 09 '18 at 17:48

Anton vBR · Accepted Answer · 2018-04-09T18:16:26.610

You are looking for unstack(). But first we need to set_index():

df = df.set_index(['ID','Hour']).unstack(fill_value=0).add_prefix('HOUR_')
df.columns = df.columns.get_level_values(1)

Or using pivot as suggested by Wen:

df = (df.pivot(index='ID', columns='Hour', values='Floor')
        .fillna(0)
        .astype(int)
        .add_prefix('HOUR_'))

Full example:

import pandas as pd

data = '''\
ID  Hour Floor          
Jay  2     34       
Jay  3     34   
Tim  0     36  
Tim  1     34
Tim  2     36
Tom  3     32
Tom  4     36
Rob  3     31
Rob  4     32
Rob  5     33
Rob  6     34'''

# Recreate dataframe
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')

# Apply solution
df = df.set_index(['ID','Hour']).unstack(fill_value=0).add_prefix('HOUR_')
df.columns = df.columns.get_level_values(1)

Df is now:

     HOUR_0  HOUR_1  HOUR_2  HOUR_3  HOUR_4  HOUR_5  HOUR_6
ID                                                         
Jay       0       0      34      34       0       0       0
Rob       0       0       0      31      32      33      34
Tim      36      34      36       0       0       0       0
Tom       0       0       0      32      36       0       0

Using groupby on a Pandas DataFrame to add arbitrary number of columns and calculate values

3 Answers3