2

I have a pandas DataFrame I want to to convert into a time table (for visualization purposes) by using groupby and adding an arbitrary number of columns based on hour time increments, and populating the data from a 3rd column.

The source DataFrame might look like:

ID  Hour Floor          
Jay  2     34       
Jay  3     34   
Tim  0     36  
Tim  1     34
Tim  2     36
Tom  3     32
Tom  4     36
Rob  3     31
Rob  4     32
Rob  5     33
Rob  6     34
...

What I am aiming for is:

ID  HOUR_0 HOUR_1 HOUR_2 HOUR_3 HOUR_4 HOUR_5 HOUR_6...
Jay   0      0      34      34      0      0      0
Tim   36     34     36      0       0      0      0
Tom   0      0      0       32      36     0      0
Rob   0      0      0       31      32     33     34

What I can't get (without manually constructing this using loops) is adding an arbitrary number of columns (after a groupby operation) based on the unique or range of hours in the first DataFrame, and then calculating each column value based on on the Hour and Floor columns from the first DataFrame.

Any ideas?

crackernutter
  • 203
  • 1
  • 3
  • 12

3 Answers3

3

Because I can't help but show how this works with pd.factorize

i, r = pd.factorize(df.ID)
j, c = pd.factorize(df.Hour, sort=True)
b = np.zeros((r.size, c.size), df.Floor.dtype)

b[i, j] = df.Floor.values

d = pd.DataFrame(b, r, [f'Hour_{h}' for h in c])

d

     Hour_0  Hour_1  Hour_2  Hour_3  Hour_4  Hour_5  Hour_6
Jay       0       0      34      34       0       0       0
Tim      36      34      36       0       0       0       0
Tom       0       0       0      32      36       0       0
Rob       0       0       0      31      32      33      34
piRSquared
  • 285,575
  • 57
  • 475
  • 624
3

Is this simple pivot ?

df.pivot(*df.columns).fillna(0).add_prefix('Hour_')
Out[71]: 
Hour  Hour_0  Hour_1  Hour_2  Hour_3  Hour_4  Hour_5  Hour_6
ID                                                          
Jay      0.0     0.0    34.0    34.0     0.0     0.0     0.0
Rob      0.0     0.0     0.0    31.0    32.0    33.0    34.0
Tim     36.0    34.0    36.0     0.0     0.0     0.0     0.0
Tom      0.0     0.0     0.0    32.0    36.0     0.0     0.0
BENY
  • 317,841
  • 20
  • 164
  • 234
2

You are looking for unstack(). But first we need to set_index():

df = df.set_index(['ID','Hour']).unstack(fill_value=0).add_prefix('HOUR_')
df.columns = df.columns.get_level_values(1)

Or using pivot as suggested by Wen:

df = (df.pivot(index='ID', columns='Hour', values='Floor')
        .fillna(0)
        .astype(int)
        .add_prefix('HOUR_'))

Full example:

import pandas as pd

data = '''\
ID  Hour Floor          
Jay  2     34       
Jay  3     34   
Tim  0     36  
Tim  1     34
Tim  2     36
Tom  3     32
Tom  4     36
Rob  3     31
Rob  4     32
Rob  5     33
Rob  6     34'''

# Recreate dataframe
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')

# Apply solution
df = df.set_index(['ID','Hour']).unstack(fill_value=0).add_prefix('HOUR_')
df.columns = df.columns.get_level_values(1)

Df is now:

     HOUR_0  HOUR_1  HOUR_2  HOUR_3  HOUR_4  HOUR_5  HOUR_6
ID                                                         
Jay       0       0      34      34       0       0       0
Rob       0       0       0      31      32      33      34
Tim      36      34      36       0       0       0       0
Tom       0       0       0      32      36       0       0
Anton vBR
  • 18,287
  • 5
  • 40
  • 46