1

I have the following pandas.Dataframe:

data = {'Timestamp': [12, 22],
        'bmw-series1-exhaust': [0.1, 0.5],
        'vw-series1-breaking': [0.7, 0.1],
        'vw-series2-breaking': [0.2, 0.5]}

df = pd.DataFrame(data)

enter image description here

For the column name transformation I use the simple function (for now):

def to_customer_series(column_name):
    return column_name.split('-')

I would like to split this dataframe into a new one with a multi-index of levels timestamp, customer and series: enter image description here

I'm currently stuck on how to transform the dataframe. Would I first need to create a new 3-level nested dictionary (I tried this, SO link) out of the split data? or is there some way to use the built-in pandas functions to achieve this?

Any help is appreciated!

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
user7638008
  • 137
  • 9

2 Answers2

1
  • See inline notation
import pandas as pd

# your sample
data = {'Timestamp': [12, 22], 'bmw-series1-exhaust': [0.1, 0.5], 'vw-series1-breaking': [0.7, 0.1], 'vw-series2-breaking': [0.2, 0.5]}

df = pd.DataFrame(data)

# set Timestamp as the index, stack the columns into a long form, and reset the index
dfl = df.set_index('Timestamp').stack().reset_index()

# split the strings on the hyphen, and create new columns from each segment
dfl[['Customer', 'Series', 'Type']] = dfl.level_1.str.split('-', expand=True)

# drop the level_1 column
dfl = dfl.drop(columns=['level_1'])

# pivot the long dataframe
dfp = dfl.pivot(index=['Timestamp', 'Customer', 'Series'], columns='Type', values=0)

# remove .columns.name
dfp.columns.name = None

# display(dfp)
                            breaking  exhaust
Timestamp Customer Series                    
12        bmw      series1       NaN      0.1
          vw       series1       0.7      NaN
                   series2       0.2      NaN
22        bmw      series1       NaN      0.5
          vw       series1       0.1      NaN
                   series2       0.5      NaN
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
1

First convert all columns to index by DataFrame.set_index with no -, create MultiIndex in columns by str.split and reshape by DataFrame.stack by first and second level, last add DataFrame.rename_axis for MultiIndex names:

df = pd.DataFrame({'Timestamp': [12, 22], 
                   'bmw-series1-exhaust': [0.1, 0.5],
                   'vw-series1-breaking': [0.7, 0.1], 
                   'vw-series2-breaking': [0.2, 0.5]})

df1 = df.set_index('Timestamp')
df1.columns = df1.columns.str.split('-', expand=True)
df1 = df1.stack([0,1]).rename_axis(['Timestamp', 'Customer','Series'])
print (df1)
                            breaking  exhaust
Timestamp Customer Series                    
12        bmw      series1       NaN      0.1
          vw       series1       0.7      NaN
                   series2       0.2      NaN
22        bmw      series1       NaN      0.5
          vw       series1       0.1      NaN
                   series2       0.5      NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252