Pandas create two new columns based on 2 existing columns

Question

I have a dataframe like the below:

dummy_dict_existing  = {'Email':['joblogs@gmail.com', 'joblogs@gmail.com'],
              'Ticket_Category': ['Tier1', 'Tier2'],
              'Quantity_Purchased': [5,2],
              'Total_Price_Paid':[1345.45, 10295.88]}

               Email Ticket_Category  Quantity_Purchased  Total_Price_Paid
0  joblogs@gmail.com           Tier1                   5           1345.45
1  joblogs@gmail.com           Tier2                   2          10295.88

What I'm trying to do is to create 2 new columns "Tier1_Quantity_Purchased" and "Tier2_Quantity_Purchased" based on the existing dataframe, and sum the total of "Total_Price_Paid" as below:

dummy_dict_desired  = {'Email':['joblogs@gmail.com'],
              'Tier1_Quantity_Purchased': [5],
               'Tier2_Quantity_Purchased':[2],
              'Total_Price_Paid':[11641.33]}

               Email  Tier1_Quantity_Purchased  Tier2_Quantity_Purchased  Total_Price_Paid
0  joblogs@gmail.com                         5                         2          11641.33

Any help would be greatly appreciated. I know there is an easy way to do this, just can't figure out how without writing some silly for loop!

Does this answer your question? [How can I pivot a dataframe?](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe) — BeRT2me, Jun 15 '22 at 17:09

score 2 · Accepted Answer · answered Jun 15 '22 at 17:04

2

What you want to do is to pivot your table, and then add a column with aggregated data from the original table.

df = pd.DataFrame(dummy_dict_existing)
pivot_df = df.pivot(index='Email', columns='Ticket_Category', values='Quantity_Purchased')
pivot_df['total'] = df.groupby('Email')['Total_Price_Paid'].sum()

Email	Tier1	Tier2	total
joblogs@gmail.com	5	2	11641.33

For more details on pivoting, take a look at How can I pivot a dataframe?

answered Jun 15 '22 at 17:04

Ignatius Reilly

1,594
2
6
15

I actually got an error when I tried, but I found this article which helped, I used aggfunc='sum' https://www.statology.org/valueerror-index-contains-duplicate-entries-cannot-reshape/ – Cummins070 Jun 15 '22 at 20:53

score 0 · Answer 2 · answered Jun 15 '22 at 17:07

import pandas as pd
dummy_dict_existing  = {'Email':['joblogs@gmail.com', 'joblogs@gmail.com'],
              'Ticket_Category': ['Tier1', 'Tier2'],
              'Quantity_Purchased': [5,2],
              'Total_Price_Paid':[1345.45, 10295.88]}


df = pd.DataFrame(dummy_dict_existing)
df2 = df[['Ticket_Category', 'Quantity_Purchased']]
df_transposed = df2.T
df_transposed.columns = ['Tier1_purchased', 'Tier2_purchased']
df_transposed = df_transposed.iloc[1:]
df_transposed = df_transposed.reset_index()
df_transposed = df_transposed[['Tier1_purchased', 'Tier2_purchased']]
df = df.groupby('Email')[['Total_Price_Paid']].sum()
df = df.reset_index()
df.join(df_transposed)

output

Pandas create two new columns based on 2 existing columns

2 Answers2