3

My Dataframe looks like this.

id age Gender snapshot_1 performance_13 snapshot_5 performance_17 snapshot_7 performance_19
1 34 M 80 30 40 30
2 42 F 65 55 60 15 25 45

ALL Id's data need to be grouped for snapshot/performance with ID repetition like below. For Snapshot and its corresponding performance window could be fetched from the number after underscore from 1st dataframe.

ID Age Gender Snapshot_Window Performance_window Snapshot_Value Performance_Value
1 34 M 1 13
2 42 F 1 13 65 55
1 34 M 5 17 80 30
2 42 F 5 17 60 15
1 34 M 7 19 40 30
2 42 F 7 19 25 45
wjandrea
  • 28,235
  • 9
  • 60
  • 81
Abhishek Prasar
  • 87
  • 1
  • 1
  • 5

3 Answers3

3

Try:

x = pd.wide_to_long(
    df,
    stubnames="performance",
    sep="_",
    i=["id", "age", "Gender"],
    j="Performance_Window",
)[["performance"]].reset_index(level=3)

y = pd.wide_to_long(
    df,
    stubnames="snapshot",
    sep="_",
    i=["id", "age", "Gender"],
    j="Snapshot_Window",
)[["snapshot"]].reset_index(level=3)

print(
    pd.concat([x, y], axis=1)
    .reset_index()
    .rename(columns={"performance": "Performance_Value", "snapshot": "Snapshot_Value"})
)

Prints:

   id  age Gender  Performance_Window Performance_Value  Snapshot_Window Snapshot_Value
0   1   34      M                  13                                  1               
1   1   34      M                  17                30                5             80
2   1   34      M                  19                30                7             40
3   2   42      F                  13                55                1             65
4   2   42      F                  17                15                5             60
5   2   42      F                  19                45                7             25
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
2

Using a MultiIndex to reshape, to handle an arbitrary number of categories:


tmp = df.set_index(['id', 'age', 'Gender'])

idx = (tmp.columns.to_series().str.split('_', n=1, expand=True)
          .assign(n=lambda x: x.groupby(0).cumcount())
       )

out = (tmp
    .set_axis(pd.MultiIndex.from_frame(idx[[0, 'n']]), axis=1)
    .stack(dropna=False).add_suffix('_value')
    .reset_index(tmp.index.names)
    .join(idx.pivot(index='n', columns=0, values=1)
             .add_suffix('_window')
          )
    .rename_axis(index=None, columns=None)
 )

Output:

   id  age Gender  performance_value  snapshot_value performance_window snapshot_window
0   1   34      M                NaN             NaN                 13               1
0   2   42      F               55.0            65.0                 13               1
1   1   34      M               30.0            80.0                 17               5
1   2   42      F               15.0            60.0                 17               5
2   1   34      M               30.0            40.0                 19               7
2   2   42      F               45.0            25.0                 19               7
mozway
  • 194,879
  • 13
  • 39
  • 75
1

One option is with pivot_longer - in this case, you pass multiple values_to - in this case, names_pattern should be a list of regex that matches the targeted columns, while names_to should be a sequence of the same length as values_to:

(df
.pivot_longer(
    index=['id','age','Gender'], 
    names_to=('snapshot_window', 'performance_window'), 
    values_to = ('snapshot_value', 'performance_value'), 
    names_pattern = ['snapshot','performance'])
.assign(snapshot_window=lambda df: df.snapshot_window
                                     .str.split("_").str[-1], 
        performance_window=lambda df: df.performance_window
                                       .str.split("_").str[-1])
)

   id  age Gender snapshot_window  snapshot_value performance_window  performance_value
0   1   34      M               1             NaN                 13                NaN
1   2   42      F               1            65.0                 13               55.0
2   1   34      M               5            80.0                 17               30.0
3   2   42      F               5            60.0                 17               15.0
4   1   34      M               7            40.0                 19               30.0
5   2   42      F               7            25.0                 19               45.0
sammywemmy
  • 27,093
  • 4
  • 17
  • 31