0

I have a dataframe like the below:

dummy_dict_existing  = {'Email':['joblogs@gmail.com', 'joblogs@gmail.com'],
              'Ticket_Category': ['Tier1', 'Tier2'],
              'Quantity_Purchased': [5,2],
              'Total_Price_Paid':[1345.45, 10295.88]}
               Email Ticket_Category  Quantity_Purchased  Total_Price_Paid
0  joblogs@gmail.com           Tier1                   5           1345.45
1  joblogs@gmail.com           Tier2                   2          10295.88

What I'm trying to do is to create 2 new columns "Tier1_Quantity_Purchased" and "Tier2_Quantity_Purchased" based on the existing dataframe, and sum the total of "Total_Price_Paid" as below:

dummy_dict_desired  = {'Email':['joblogs@gmail.com'],
              'Tier1_Quantity_Purchased': [5],
               'Tier2_Quantity_Purchased':[2],
              'Total_Price_Paid':[11641.33]}

               Email  Tier1_Quantity_Purchased  Tier2_Quantity_Purchased  Total_Price_Paid
0  joblogs@gmail.com                         5                         2          11641.33

Any help would be greatly appreciated. I know there is an easy way to do this, just can't figure out how without writing some silly for loop!

BeRT2me
  • 12,699
  • 2
  • 13
  • 31
Cummins070
  • 63
  • 5
  • 1
    Does this answer your question? [How can I pivot a dataframe?](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe) – BeRT2me Jun 15 '22 at 17:09

2 Answers2

2

What you want to do is to pivot your table, and then add a column with aggregated data from the original table.

df = pd.DataFrame(dummy_dict_existing)
pivot_df = df.pivot(index='Email', columns='Ticket_Category', values='Quantity_Purchased')
pivot_df['total'] = df.groupby('Email')['Total_Price_Paid'].sum()
Email Tier1 Tier2 total
joblogs@gmail.com 5 2 11641.33

For more details on pivoting, take a look at How can I pivot a dataframe?

Ignatius Reilly
  • 1,594
  • 2
  • 6
  • 15
  • I actually got an error when I tried, but I found this article which helped, I used aggfunc='sum' https://www.statology.org/valueerror-index-contains-duplicate-entries-cannot-reshape/ – Cummins070 Jun 15 '22 at 20:53
0
import pandas as pd
dummy_dict_existing  = {'Email':['joblogs@gmail.com', 'joblogs@gmail.com'],
              'Ticket_Category': ['Tier1', 'Tier2'],
              'Quantity_Purchased': [5,2],
              'Total_Price_Paid':[1345.45, 10295.88]}


df = pd.DataFrame(dummy_dict_existing)
df2 = df[['Ticket_Category', 'Quantity_Purchased']]
df_transposed = df2.T
df_transposed.columns = ['Tier1_purchased', 'Tier2_purchased']
df_transposed = df_transposed.iloc[1:]
df_transposed = df_transposed.reset_index()
df_transposed = df_transposed[['Tier1_purchased', 'Tier2_purchased']]
df = df.groupby('Email')[['Total_Price_Paid']].sum()
df = df.reset_index()
df.join(df_transposed)

output

enter image description here

Kavi Harjani
  • 661
  • 5
  • 15