Transposing each row data to column for each ID in dataframe

Question

My Dataframe looks like this.

id	age	Gender	snapshot_1	performance_13	snapshot_5	performance_17	snapshot_7	performance_19
1	34	M			80	30	40	30
2	42	F	65	55	60	15	25	45

ALL Id's data need to be grouped for snapshot/performance with ID repetition like below. For Snapshot and its corresponding performance window could be fetched from the number after underscore from 1st dataframe.

ID	Age	Gender	Snapshot_Window	Performance_window	Snapshot_Value	Performance_Value
1	34	M	1	13
2	42	F	1	13	65	55
1	34	M	5	17	80	30
2	42	F	5	17	60	15
1	34	M	7	19	40	30
2	42	F	7	19	25	45

Related: [How can I pivot a dataframe?](/q/47152691/4518341) (broad overview) — wjandrea, Aug 10 '23 at 16:15

score 3 · Answer 1 · answered Aug 09 '23 at 21:58

Try:

x = pd.wide_to_long(
    df,
    stubnames="performance",
    sep="_",
    i=["id", "age", "Gender"],
    j="Performance_Window",
)[["performance"]].reset_index(level=3)

y = pd.wide_to_long(
    df,
    stubnames="snapshot",
    sep="_",
    i=["id", "age", "Gender"],
    j="Snapshot_Window",
)[["snapshot"]].reset_index(level=3)

print(
    pd.concat([x, y], axis=1)
    .reset_index()
    .rename(columns={"performance": "Performance_Value", "snapshot": "Snapshot_Value"})
)

Prints:

   id  age Gender  Performance_Window Performance_Value  Snapshot_Window Snapshot_Value
0   1   34      M                  13                                  1               
1   1   34      M                  17                30                5             80
2   1   34      M                  19                30                7             40
3   2   42      F                  13                55                1             65
4   2   42      F                  17                15                5             60
5   2   42      F                  19                45                7             25

score 2 · Accepted Answer · answered Aug 10 '23 at 01:55

Using a MultiIndex to reshape, to handle an arbitrary number of categories:


tmp = df.set_index(['id', 'age', 'Gender'])

idx = (tmp.columns.to_series().str.split('_', n=1, expand=True)
          .assign(n=lambda x: x.groupby(0).cumcount())
       )

out = (tmp
    .set_axis(pd.MultiIndex.from_frame(idx[[0, 'n']]), axis=1)
    .stack(dropna=False).add_suffix('_value')
    .reset_index(tmp.index.names)
    .join(idx.pivot(index='n', columns=0, values=1)
             .add_suffix('_window')
          )
    .rename_axis(index=None, columns=None)
 )

Output:

   id  age Gender  performance_value  snapshot_value performance_window snapshot_window
0   1   34      M                NaN             NaN                 13               1
0   2   42      F               55.0            65.0                 13               1
1   1   34      M               30.0            80.0                 17               5
1   2   42      F               15.0            60.0                 17               5
2   1   34      M               30.0            40.0                 19               7
2   2   42      F               45.0            25.0                 19               7

score 1 · Answer 3 · answered Aug 10 '23 at 10:18

One option is with pivot_longer - in this case, you pass multiple values_to - in this case, names_pattern should be a list of regex that matches the targeted columns, while names_to should be a sequence of the same length as values_to:

(df
.pivot_longer(
    index=['id','age','Gender'], 
    names_to=('snapshot_window', 'performance_window'), 
    values_to = ('snapshot_value', 'performance_value'), 
    names_pattern = ['snapshot','performance'])
.assign(snapshot_window=lambda df: df.snapshot_window
                                     .str.split("_").str[-1], 
        performance_window=lambda df: df.performance_window
                                       .str.split("_").str[-1])
)

   id  age Gender snapshot_window  snapshot_value performance_window  performance_value
0   1   34      M               1             NaN                 13                NaN
1   2   42      F               1            65.0                 13               55.0
2   1   34      M               5            80.0                 17               30.0
3   2   42      F               5            60.0                 17               15.0
4   1   34      M               7            40.0                 19               30.0
5   2   42      F               7            25.0                 19               45.0

ID	Age	Gender	Snapshot_Window	Performance_window	Snapshot_Value	Performance_Value
1	34	M	1	13
2	42	F	1	13	65	55
1	34	M	5	17	80	30
2	42	F	5	17	60	15
1	34	M	7	19	40	30
2	42	F	7	19	25	45

ID	Age	Gender	Snapshot_Window	Performance_window	Snapshot_Value	Performance_Value
1	34	M	1	13
2	42	F	1	13	65	55
1	34	M	5	17	80	30
2	42	F	5	17	60	15
1	34	M	7	19	40	30
2	42	F	7	19	25	45

Transposing each row data to column for each ID in dataframe

3 Answers3

ID	Age	Gender	Snapshot_Window	Performance_window	Snapshot_Value	Performance_Value
1	34	M	1	13
2	42	F	1	13	65	55
1	34	M	5	17	80	30
2	42	F	5	17	60	15
1	34	M	7	19	40	30
2	42	F	7	19	25	45