Merge dataframe rows in pandas

Question

Hi I have a dataset like this:

ID	Gender	Group	Biomarker	Value
001	Male	Test	CA125	X1
001	Male	Test	CEA	X2
001	Male	Test	AFP	X3
002	Female	Control	CA125	X4
002	Female	Control	CEA	X5
002	Female	Control	AFP	X6

I want to merge all rows with the same id into a single row and change the column names to the biomarkers for example:

ID	Gender	Group	CA125	CEA	AFP
001	Male	Test	X1	X2	X3
002	Female	Control	X4	X5	X6

Some IDs might not have some of the biomarkers: I want them to have missing data in the final table*** Some IDs might have several values for a biomarker: I want them to have the minimum of the values

What should I do?

score 0 · Answer 1 · answered Feb 12 '22 at 13:59

0

You can use DataFrame.pivot() method

piv = df.pivot(index=['ID', 'Gender', 'Group'], 
               columns='Biomarker', 
               values='Value')\
               .reset_index()

answered Feb 12 '22 at 13:59

John Giorgio

634
3
10

It gives error --> ValueError: Index contains duplicate entries, cannot reshape – Soori Feb 12 '22 at 14:26
Please check whether you have more than one row with the same values of ID, Gender and Group and check if it would make sense to maintain all these rows or if only one should be kept. – John Giorgio Feb 12 '22 at 14:52

Merge dataframe rows in pandas

1 Answers1