How to melt a dataframe based on a pattern of the column name?

Question

I'm trying to melt a dataframe

Year | Baseline_low | Baseline_high | Overall_low | Overall_high
================================================================    
2022 |            1 |             2 |           3 |            4
2023 |            5 |             6 |           7 |            8

to this:

2022 | Baseline |  low | 1
2022 | Baseline | high | 2

There are a lot more columns that end with _low or _high than I've listed so I would like to use a pattern to split the column.

Is it possible to do this with melt?

I've seen this similar question but it has a slightly different structure to my dataframe.

score 3 · Accepted Answer · answered Sep 28 '22 at 13:58

If you aim to be pure pandas, then yes melt is they key:

import io
import pandas as pd

df = pd.read_csv(io.StringIO("""Year | Baseline_low | Baseline_high | Overall_low | Overall_high  
2022 |            1 |             2 |           3 |            4
2023 |            5 |             6 |           7 |            8"""), sep="|")
df.columns = [column.strip() for column in df.columns]

final = df.melt(id_vars=["Year"])
final["Type"] = final["variable"].apply(lambda x: x.split("_")[0])
final["State"] = final["variable"].apply(lambda x: x.split("_")[1])

Then just split information you need based on column names, it returns:

   Year       variable  value      Type State
0  2022   Baseline_low      1  Baseline   low
1  2023   Baseline_low      5  Baseline   low
2  2022  Baseline_high      2  Baseline  high
3  2023  Baseline_high      6  Baseline  high
4  2022    Overall_low      3   Overall   low
5  2023    Overall_low      7   Overall   low
6  2022   Overall_high      4   Overall  high
7  2023   Overall_high      8   Overall  high

Naveed · Answer 2 · 2022-09-28T14:20:38.267

3

#melt
df2=df.melt('Year')

# split the columns
df2[['col','level']]=df2['variable'].str.split('_', expand=True)

# pivot
df2.pivot(index=['Year','level'], columns='col', values='value').reset_index().rename_axis(columns=None) 

#rename_axis, thanks to @mozway!

    Year    level   Baseline    Overall
0   2022    high           2    4
1   2022    low            1    3
2   2023    high           6    8
3   2023    low            5    7

edited Sep 28 '22 at 14:20

answered Sep 28 '22 at 14:14

Naveed

11,495
2
14
21

I bet you copy/pasted your other answer :p – mozway Sep 28 '22 at 14:15
its similar, not pasted though, copy yes. I was thinking of marking this question as a duplicate though – Naveed Sep 28 '22 at 14:16
@mozway, question for you, how to get rid of the 'col' , its a first level of column index – Naveed Sep 28 '22 at 14:17
1

`.rename_axis(columns=None)` – mozway Sep 28 '22 at 14:18

score 2 · Answer 3 · edited Sep 29 '22 at 02:00

You cannot achieve this with melt only. You need post-processing or to use a reshaping with a MultiIndex.

It's however a good use case for janitor's pivot_longer:

import janitor

df.pivot_longer(
    index='Year', 
    names_to=('.value', 'new'), 
    names_sep='_', 
    sort_by_appearance=True
)

output:

   Year   new  Baseline  Overall
0  2022   low         1        3
1  2022  high         2        4
2  2023   low         5        7
3  2023  high         6        8

If you do need to flip all the sub parts of the column names into new columns, then you do not need the .value placeholder. Just pass in new column names :

df.pivot_longer(
     index = 'Year', 
     names_to = ('Type', 'State'), 
     names_sep = '_')

   Year      Type State  value
0  2022  Baseline   low      1
1  2023  Baseline   low      5
2  2022  Baseline  high      2
3  2023  Baseline  high      6
4  2022   Overall   low      3
5  2023   Overall   low      7
6  2022   Overall  high      4
7  2023   Overall  high      8

score 0 · Answer 4 · answered Oct 27 '22 at 06:43

i have a normal solution

df1.set_index('Year').stack().reset_index()\
    .assign(col1=lambda dd:dd.level_1.map(lambda x:x.split('_')[0]))\
    .assign(col2=lambda dd:dd.level_1.map(lambda x:x.split('_')[1]))\
    .drop('level_1',axis=1).pipe(print)

  Year  0      col1  col2
0  2022  1  Baseline   low
1  2022  2  Baseline  high
2  2022  3   Overall   low
3  2022  4   Overall  high
4  2023  5  Baseline   low
5  2023  6  Baseline  high
6  2023  7   Overall   low
7  2023  8   Overall  high

How to melt a dataframe based on a pattern of the column name?

4 Answers4