0

Hi I have a dataset like this:

ID Gender Group Biomarker Value
001 Male Test CA125 X1
001 Male Test CEA X2
001 Male Test AFP X3
002 Female Control CA125 X4
002 Female Control CEA X5
002 Female Control AFP X6

I want to merge all rows with the same id into a single row and change the column names to the biomarkers for example:

ID Gender Group CA125 CEA AFP
001 Male Test X1 X2 X3
002 Female Control X4 X5 X6

Some IDs might not have some of the biomarkers: I want them to have missing data in the final table*** Some IDs might have several values for a biomarker: I want them to have the minimum of the values

What should I do?

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
Soori
  • 1

1 Answers1

0

You can use DataFrame.pivot() method

piv = df.pivot(index=['ID', 'Gender', 'Group'], 
               columns='Biomarker', 
               values='Value')\
               .reset_index()
John Giorgio
  • 634
  • 3
  • 10
  • It gives error --> ValueError: Index contains duplicate entries, cannot reshape – Soori Feb 12 '22 at 14:26
  • Please check whether you have more than one row with the same values of ID, Gender and Group and check if it would make sense to maintain all these rows or if only one should be kept. – John Giorgio Feb 12 '22 at 14:52