How to pivot multiple columns

Question

Input Table

index	income	Education	age1to_20	pcd
1	income_1	Education_0	1	A5009
2	income_2	Education_2	1	A3450
3	income_1	Education_0	1	A5009
4	income_3	Education_1	0	A3450

How do I convert this table into

index	income_1	income_2	INCOME_3	Education_0	Education_1	Education_2	age1to_20
1	A5009	0	0	A5009	0	0	A5009
2	0	A3450	0	0	0	A3450	A3450
3	A5009	0	0	A5009	0	0	A5009
4	0	0	A3450	0	A3450	0	0

UPDATED THE OUTPUT TABLE

PaulS · Answer 1 · 2022-11-09T10:42:24.667

Another possible solution:

(pd.concat([
    df.pivot(index=['index', 'age1to_20'], columns=['income'], values='pcd'),
    df.pivot(index=['index', 'age1to_20'], columns=['Education'], values='pcd')], axis=1)
 .fillna(0).reset_index())

Output:

   index  age1to_20 income_1 income_2 income_3 Education_0 Education_1 Education_2
0      1          1    A5009        0        0       A5009           0           0
1      2          1        0    A3450        0           0           0       A3450
2      3          1    A5009        0        0       A5009           0           0
3      4          0        0        0    A3450           0       A3450           0

EDIT

In case there are a lot of columns to pivot, the following code does that, by iterating over the list of columns to pivot with map:

cols = ['income', 'Education']

(pd.concat(
    map(lambda x: df.pivot(
        index=['index', 'age1to_20'], columns=x, values='pcd'), cols), axis=1)
 .fillna(0).reset_index())

What if I have 20 columns that I need to pivot? Do I have to concat 20 pivot tables? is there a better way to do this? — nuke, Nov 09 '22 at 10:13
Yes, @nuke, but that can be fully automated: Please, see my edited solution. — PaulS, Nov 09 '22 at 10:25

mozway · Accepted Answer · 2022-11-09T14:50:12.027

2

you need to melt and pivot:

cols_to_pivot = ['income', 'Education']

cols = df.columns.difference(cols_to_pivot)
(df.melt(cols).drop(columns='variable')
   .pivot(index=['index', 'age1to_20'], columns='value', values='pcd')
   .fillna(0).reset_index()
)

output:

value  index  age1to_20 Education_0 Education_1 Education_2 income_1 income_2 income_3
0          1          1       A5009           0           0    A5009        0        0
1          2          1           0           0       A3450        0    A3450        0
2          3          1       A5009           0           0    A5009        0        0
3          4          0           0       A3450           0        0        0    A3450

reworking column age1to_20

cols_to_pivot = ['income', 'Education']

cols = df.columns.difference(cols_to_pivot)
(df.assign(age1to_20=df['pcd'].where(df['age1to_20'].eq(1), 0))
   .melt(cols).drop(columns='variable')
   .pivot(index=['index', 'age1to_20'], columns='value', values='pcd')
   .fillna(0).reset_index()
)

output:

value  index age1to_20 Education_0 Education_1 Education_2 income_1 income_2 income_3
0          1     A5009       A5009           0           0    A5009        0        0
1          2     A3450           0           0       A3450        0    A3450        0
2          3     A5009       A5009           0           0    A5009        0        0
3          4         0           0       A3450           0        0        0    A3450

edited Nov 09 '22 at 14:50

answered Nov 09 '22 at 09:54

mozway

194,879
13
39
75

what should I update to get pcd value filled for age1to_20 column? – nuke Nov 09 '22 at 14:19
what do you mean? You would like to have A5009/A3450/A5009/A3450 in place of 1/1/1/0? – mozway Nov 09 '22 at 14:22
yes. updated table. A5009/A3450/A5009/0 – nuke Nov 09 '22 at 14:44
only issue is I am getting A5009 and A3450 as new columns – nuke Nov 09 '22 at 15:27
@nuke it doesn't do it with the provided example, if you want me to debug, please provide a DataFrame constructor that reproduces the issue – mozway Nov 09 '22 at 15:34
yes. it's correct. what if there are multiple columns like 'age1to_20'. Is there a better way to write rather than writing df.assign for every single column? – nuke Nov 10 '22 at 01:43

score 1 · Answer 3 · answered Nov 09 '22 at 11:28

One option is to pivot once, run a for loop across the dataframe to get the individual dataframes, and concatenate across the columns, to get the final output:

out = df.pivot(index=['index', 'age1to_20'], 
               columns=['income', 'Education'], 
               values = 'pcd')

out = [out.droplevel(axis=1,level=n) 
       for n in range(out.columns.nlevels)]

pd.concat(out, axis = 1).fillna(0, downcast='infer').reset_index()

   index  age1to_20 Education_0 Education_2 Education_1 income_1 income_2 income_3
0      1          1       A5009           0           0    A5009        0        0
1      2          1           0       A3450           0        0    A3450        0
2      3          1       A5009           0           0    A5009        0        0
3      4          0           0           0       A3450        0        0    A3450

Gonçalo Peres · Answer 4 · 2022-11-09T16:51:25.110

pandas.get_dummies allows us to get almost the desired output, but instead of getting the values as the pcd column, one has 0 or 1 for a given pcd

import pandas as pd

df2 = pd.get_dummies(df, columns=['income', 'Education'])

[Out]:

   index  age1to_20  ... Education_Education_1  Education_Education_2
0      1          1  ...                     0                      0
1      2          1  ...                     0                      1
2      3          1  ...                     0                      0
3      4          0  ...                     1                      0

Then, in order to change the 1s with the respective pcd value (and drop the column pcd at the end), one can use pandas.Series.mask and pandas.DataFrame.drop as follows

cols = df2.columns.drop(['index', 'pcd']) # To select the columns one wants to apply the change
df2[cols] = df2[cols].mask(df2[cols] == 1, df2['pcd'], axis=0)
df2 = df2.drop(['pcd'], axis=1) 

[Out]:

   index age1to_20  ... Education_Education_1 Education_Education_2
0      1     A5009  ...                     0                     0
1      2     A3450  ...                     0                 A3450
2      3     A5009  ...                     0                     0
3      4         0  ...                 A3450                     0

Notes:

An alternative to change the 1s to the respective pcd and drop the column pcd would be to use pandas.DataFrame.assign, numpy.where and pandas.DataFrame.drop as follows

import numpy as np

df2 = df2.assign(income_1 = np.where(df2['income_income_1'] == 1, df2['pcd'], 0),
              income_2 = np.where(df2['income_income_2'] == 1, df2['pcd'], 0),
              income_3 = np.where(df2['income_income_3'] == 1, df2['pcd'], 0),
              Education_0 = np.where(df2['Education_Education_0'] == 1, df2['pcd'], 0),
              Education_1 = np.where(df2['Education_Education_1'] == 1, df2['pcd'], 0),
              Education_2 = np.where(df2['Education_Education_2'] == 1, df2['pcd'], 0),
              Age1to20 = np.where(df2['age1to_20'] == 1, df2['pcd'], 0)).drop(['age1to_20', 'income_income_1', 'income_income_2', 'income_income_3', 'Education_Education_0', 'Education_Education_1', 'Education_Education_2', 'pcd'], axis=1)

[Out]:

   index income_1 income_2  ... Education_1 Education_2 Age1to20
0      1    A5009        0  ...           0           0    A5009
1      2        0    A3450  ...           0       A3450    A3450
2      3    A5009        0  ...           0           0    A5009
3      4        0        0  ...       A3450           0        0

There are additional ways to select the columns one wants to consider in a given dataframe. For that this might be useful: Selecting multiple columns in a Pandas dataframe

How to pivot multiple columns

4 Answers4

reworking column age1to_20