How to transform a dataframe from wide to long

Question

I have the following pandas.Dataframe:

data = {'Timestamp': [12, 22],
        'bmw-series1-exhaust': [0.1, 0.5],
        'vw-series1-breaking': [0.7, 0.1],
        'vw-series2-breaking': [0.2, 0.5]}

df = pd.DataFrame(data)

For the column name transformation I use the simple function (for now):

def to_customer_series(column_name):
    return column_name.split('-')

I would like to split this dataframe into a new one with a multi-index of levels timestamp, customer and series:

I'm currently stuck on how to transform the dataframe. Would I first need to create a new 3-level nested dictionary (I tried this, SO link) out of the split data? or is there some way to use the built-in pandas functions to achieve this?

Any help is appreciated!

Trenton McKinney · Answer 1 · 2020-10-07T06:19:52.920

See inline notation

import pandas as pd

# your sample
data = {'Timestamp': [12, 22], 'bmw-series1-exhaust': [0.1, 0.5], 'vw-series1-breaking': [0.7, 0.1], 'vw-series2-breaking': [0.2, 0.5]}

df = pd.DataFrame(data)

# set Timestamp as the index, stack the columns into a long form, and reset the index
dfl = df.set_index('Timestamp').stack().reset_index()

# split the strings on the hyphen, and create new columns from each segment
dfl[['Customer', 'Series', 'Type']] = dfl.level_1.str.split('-', expand=True)

# drop the level_1 column
dfl = dfl.drop(columns=['level_1'])

# pivot the long dataframe
dfp = dfl.pivot(index=['Timestamp', 'Customer', 'Series'], columns='Type', values=0)

# remove .columns.name
dfp.columns.name = None

# display(dfp)
                            breaking  exhaust
Timestamp Customer Series                    
12        bmw      series1       NaN      0.1
          vw       series1       0.7      NaN
                   series2       0.2      NaN
22        bmw      series1       NaN      0.5
          vw       series1       0.1      NaN
                   series2       0.5      NaN

jezrael · Accepted Answer · 2020-10-07T06:09:25.237

First convert all columns to index by DataFrame.set_index with no -, create MultiIndex in columns by str.split and reshape by DataFrame.stack by first and second level, last add DataFrame.rename_axis for MultiIndex names:

df = pd.DataFrame({'Timestamp': [12, 22], 
                   'bmw-series1-exhaust': [0.1, 0.5],
                   'vw-series1-breaking': [0.7, 0.1], 
                   'vw-series2-breaking': [0.2, 0.5]})

df1 = df.set_index('Timestamp')
df1.columns = df1.columns.str.split('-', expand=True)
df1 = df1.stack([0,1]).rename_axis(['Timestamp', 'Customer','Series'])
print (df1)
                            breaking  exhaust
Timestamp Customer Series                    
12        bmw      series1       NaN      0.1
          vw       series1       0.7      NaN
                   series2       0.2      NaN
22        bmw      series1       NaN      0.5
          vw       series1       0.1      NaN
                   series2       0.5      NaN

How to transform a dataframe from wide to long

2 Answers2