0

I'm trying to melt a dataframe

Year | Baseline_low | Baseline_high | Overall_low | Overall_high
================================================================    
2022 |            1 |             2 |           3 |            4
2023 |            5 |             6 |           7 |            8

to this:

2022 | Baseline |  low | 1
2022 | Baseline | high | 2

There are a lot more columns that end with _low or _high than I've listed so I would like to use a pattern to split the column.

Is it possible to do this with melt?

I've seen this similar question but it has a slightly different structure to my dataframe.

Chris Snow
  • 23,813
  • 35
  • 144
  • 309

4 Answers4

3

If you aim to be pure pandas, then yes melt is they key:

import io
import pandas as pd

df = pd.read_csv(io.StringIO("""Year | Baseline_low | Baseline_high | Overall_low | Overall_high  
2022 |            1 |             2 |           3 |            4
2023 |            5 |             6 |           7 |            8"""), sep="|")
df.columns = [column.strip() for column in df.columns]

final = df.melt(id_vars=["Year"])
final["Type"] = final["variable"].apply(lambda x: x.split("_")[0])
final["State"] = final["variable"].apply(lambda x: x.split("_")[1])

Then just split information you need based on column names, it returns:

   Year       variable  value      Type State
0  2022   Baseline_low      1  Baseline   low
1  2023   Baseline_low      5  Baseline   low
2  2022  Baseline_high      2  Baseline  high
3  2023  Baseline_high      6  Baseline  high
4  2022    Overall_low      3   Overall   low
5  2023    Overall_low      7   Overall   low
6  2022   Overall_high      4   Overall  high
7  2023   Overall_high      8   Overall  high
jlandercy
  • 7,183
  • 1
  • 39
  • 57
3
#melt
df2=df.melt('Year')

# split the columns
df2[['col','level']]=df2['variable'].str.split('_', expand=True)

# pivot
df2.pivot(index=['Year','level'], columns='col', values='value').reset_index().rename_axis(columns=None) 

#rename_axis, thanks to @mozway!


    Year    level   Baseline    Overall
0   2022    high           2    4
1   2022    low            1    3
2   2023    high           6    8
3   2023    low            5    7
Naveed
  • 11,495
  • 2
  • 14
  • 21
2

You cannot achieve this with melt only. You need post-processing or to use a reshaping with a MultiIndex.

It's however a good use case for janitor's pivot_longer:

import janitor

df.pivot_longer(
    index='Year', 
    names_to=('.value', 'new'), 
    names_sep='_', 
    sort_by_appearance=True
)

output:

   Year   new  Baseline  Overall
0  2022   low         1        3
1  2022  high         2        4
2  2023   low         5        7
3  2023  high         6        8

If you do need to flip all the sub parts of the column names into new columns, then you do not need the .value placeholder. Just pass in new column names :

df.pivot_longer(
     index = 'Year', 
     names_to = ('Type', 'State'), 
     names_sep = '_')

   Year      Type State  value
0  2022  Baseline   low      1
1  2023  Baseline   low      5
2  2022  Baseline  high      2
3  2023  Baseline  high      6
4  2022   Overall   low      3
5  2023   Overall   low      7
6  2022   Overall  high      4
7  2023   Overall  high      8
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
mozway
  • 194,879
  • 13
  • 39
  • 75
0
i have a normal solution

df1.set_index('Year').stack().reset_index()\
    .assign(col1=lambda dd:dd.level_1.map(lambda x:x.split('_')[0]))\
    .assign(col2=lambda dd:dd.level_1.map(lambda x:x.split('_')[1]))\
    .drop('level_1',axis=1).pipe(print)

  Year  0      col1  col2
0  2022  1  Baseline   low
1  2022  2  Baseline  high
2  2022  3   Overall   low
3  2022  4   Overall  high
4  2023  5  Baseline   low
5  2023  6  Baseline  high
6  2023  7   Overall   low
7  2023  8   Overall  high
G.G
  • 639
  • 1
  • 5