How to eliminate duplicate Id rows by adding columns using Python Pandas Dataframe

Question

How do I create 2 columns out of Source and place values from Capacity in them thereby eliminating the duplicate Ids. Run the below code:

df = pd.DataFrame({'Id':['a','a','b','b','c','c','d','d','e','e'],
'Source':['Group','Ranking','Ranking','Group','Group','Ranking','Group','Ranking','Group','Ranking'],
'Capacity':['Young Professionals','$1 - $199,989','$1 - $199,989','Midlife Suburbanites','Elite Families','$549,948 - $649,975','Small Town Families','$1 - $199,989', 
'Small Town Families','$1 - $199,989']})

New dataframe would be:

Id  Group                   Ranking
a   Young Professionals     $1 - $199,989
b   Midlife Suburbanites    $1 - $199,989
c   Elite Families          $549,948 - $649,975
d   Small Town Families     $1 - $199,989
e   Small Town Families     $1 - $199,989

le_camerone · Answer 1 · 2022-03-30T04:32:12.073

0

Does this work?

df.pivot(index='Id', columns='Source', values='Capacity')

edited Mar 30 '22 at 04:32

answered Mar 30 '22 at 04:22

le_camerone

630
5
17

Your solution works on my sample dataframe that I provided. So I got what I asked for. It does not work when I apply it to my larger actual dataframe. I get the error ValueError: Index contains duplicate entries, cannot reshape. So I suppose there is something wrong with my actual dataframe. I don't know why the sample I provided does not generate the same error as there are duplicates in the index column. – claghorn Mar 30 '22 at 16:49
I did find in my actual dataframe that there are some records with more than two occurrences of the same Id. This is causing the error. The sample I provided only has max two occurrences of the same Id. Do you know of a way to handle this condition? – claghorn Mar 30 '22 at 17:26
Here is the dataframe that has the occurrence of more than two Id's. Can I ask you if you know to handle this? df = pd.DataFrame({'Id':['a','a','b','b','b','c','c','d','d','e','e'], 'Source':['Group','Ranking','Ranking','Group','Group','Group','Ranking','Group','Ranking','Group','Ranking'], 'Capacity':['Young Professionals','$1 - $199,989','$1 - $199,989','Midlife Suburbanites','Elite Families','Elite Families','$549,948 - $649,975','Small Town Families','$1 - $199,989', 'Small Town Families','$1 - $199,989']}) – claghorn Mar 30 '22 at 17:33
I dropped the duplicates from the dataframe using two columns so that there are no more than two Id's in the dataframe that pertains to your solution. I applied your solution in my code as you supplied it. So I will accept your solution. – claghorn Mar 30 '22 at 18:58
I was asleep sorry. Yes, it would have to do with the duplicates. Hope it all works now. Please accept the answer if it works for you. – le_camerone Mar 30 '22 at 21:20

How to eliminate duplicate Id rows by adding columns using Python Pandas Dataframe

1 Answers1

Linked