3

I have a dataframe:

df = AG_Speed AG_wolt AB_Speed AB_wolt C1 C2 C3
       1         2      3         4     6  7  8
       1         9      2         6     4  1  8

And I want to pivot it based on prefix to get:

df = Speed Wolt C1 C2 C3 Category
      1      2   6 7  8    AG
      3      4   6 7  8    AB
      1      9   4 1  8    AG
      2      6   4 1  8    AG 

What is the best way to do it?

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
Cranjis
  • 1,590
  • 8
  • 31
  • 64

2 Answers2

2

We can use pd.wide_to_long for this. But since it expects the column names to start with the stubnames, we have to reverse the column format:

df.columns = ["_".join(col.split("_")[::-1]) for col in df.columns]
res = pd.wide_to_long(
    df, 
    stubnames=["Speed", "wolt"], 
    i=["C1", "C2", "C3"], 
    j="Category", 
    sep="_", 
    suffix="[A-Za-z]+"
).reset_index()
   C1  C2  C3 Category  Speed  wolt
0   6   7   8       AG      1     2
1   6   7   8       AB      3     4
2   4   1   8       AG      1     9
3   4   1   8       AB      2     6

If you want the columns in a specific order, use DataFrame.reindex:

res.reindex(columns=["Speed", "wolt", "C1", "C2", "C3", "Category"])
   Speed  wolt  C1  C2  C3 Category
0      1     2   6   7   8       AG
1      3     4   6   7   8       AB
2      1     9   4   1   8       AG
3      2     6   4   1   8       AB
Erfan
  • 40,971
  • 8
  • 66
  • 78
0

One option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

df.pivot_longer(index = ['C1', 'C2', 'C3'], 
                names_to = ('Category', '.value'), 
                names_sep='_')

   C1  C2  C3 Category  Speed  wolt
0   6   7   8       AG      1     2
1   4   1   8       AG      1     9
2   6   7   8       AB      3     4
3   4   1   8       AB      2     6

In the above solution, the .value determines which parts of the column labels remain as headers - the labels are split apart with the names_sep.

sammywemmy
  • 27,093
  • 4
  • 17
  • 31