Flatten pandas dataframe multiindex

Question

I have a dataframe created with the following:

data = {
    'Date': ['2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00'],
    'Distance': ['50', '100', '150', '50', '100', '150'],
    'WS': [10, 20, 30, 40, 50, 60],
    'DIR': [11, 21, 31, 41, 51, 61]
}

df = pd.DataFrame(data)

print(df)

                  Date Distance  WS  DIR
0  2021-06-15T00:10:00       50  10   11
1  2021-06-15T00:10:00      100  20   21
2  2021-06-15T00:10:00      150  30   31
3  2021-06-15T00:20:00       50  40   41
4  2021-06-15T00:20:00      100  50   51
5  2021-06-15T00:20:00      150  60   61

And I want to flatten it so it looks like this:

                    WS_50 DIR_50  WS_100 DIR_100 WS_150  DIR_50
Date
2021-06-15 00:10:00  10     11      20      21      30      31
2021-06-15 00:20:00  40     41      50      51      60      61

I have tried using the pivot function but that produces the following:

df['Date'] = pd.to_datetime(df['Date'])

pivot_df = df.pivot(index='Date', columns='Distance', values=['WS', 'DIR'])

                     WS         DIR
Distance            100 150  50 100 150  50
Date
2021-06-15 00:10:00  20  30  10  21  31  11
2021-06-15 00:20:00  50  60  40  51  61  41

As you can see, it's organized by WS and DIR, and I want it organized by Distance (50, 100, 150).

Thanks

Hi @mozway, thanks. That doesn't solve the question of sorting columns by "Distance" column — mr_vodoo, Aug 10 '23 at 07:28

score 2 · Answer 1 · answered Aug 10 '23 at 12:09

Try this:

df_out = df.set_index(['Date','Distance'], append=True).unstack()
df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
df_out = df_out.groupby('Date').first()
df_out = df_out.sort_index(key=lambda x: df_out.columns.str.split('_').str[1].astype(int), axis=1)
df_out

Output:

                     WS_50  DIR_50  WS_100  DIR_100  WS_150  DIR_150
Date                                                                
2021-06-15T00:10:00   10.0    11.0    20.0     21.0    30.0     31.0
2021-06-15T00:20:00   40.0    41.0    50.0     51.0    60.0     61.0

score 1 · Accepted Answer · answered Aug 10 '23 at 12:39

You can use an integer type, sort the index, then flatten:

df['Date'] = pd.to_datetime(df['Date'])

pivot_df = (df.astype({'Distance': int})
              .pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
              .sort_index(level='Distance', sort_remaining=False, axis=1)
           )
pivot_df.columns = pivot_df.columns.map(lambda x: f'{x[0]}_{x[1]}')

Output:

                     WS_50  DIR_50  WS_100  DIR_100  WS_150  DIR_150
Date                                                                
2021-06-15 00:10:00     10      11      20       21      30       31
2021-06-15 00:20:00     40      41      50       51      60       61

Flatten pandas dataframe multiindex

2 Answers2