Suppose there is a dataframe like this:
Something_ID | Description | Col1 | Col2 |
---|---|---|---|
id1 | Desc1 | 1.1 | 0.1 |
id1 | Desc2 | 1.1 | 2.1 |
id1 | Desc3 | 1.1 | 3.1 |
id2 | Desc1 | 8.1 | 1.1 |
id2 | Desc4 | 8.1 | 5.1 |
id2 | Desc2 | 8.1 | 2.1 |
I want to compress this dataframe into one ID one row format while creating new columns if there are different values in a column for the same ID. The output dataframe should be like this:
Something_ID | Col1 | Desc1_Col2 | Desc2_Col2 | Desc3_Col2 | Desc4_Col2 |
---|---|---|---|---|---|
id1 | 1.1 | 0.1 | 2.1 | 3.1 | NaN |
id2 | 8.1 | 1.1 | 2.1 | NaN | 5.1 |
New column names are created using 'Description' column. If each ID group has same values in a column, we keep that column without modification but keep only one record.
The first idea I got was iterating through columns. But the actual scenario has many columns and rows and that approach might take long time because it has many steps.
Since I am new to Pandas, I don't have enough knowledge on how to apply its functions. Also I could not find any helpful links related to this.
So I hope someone can help me to solve this.
Thank you in advance!
---UPDATE---
I found a helpful approach in this question: How make a dataframe in wide format creating new column names from the existing ones?
It solves part of my problem. But my other requirement is to keep the column as it is, if the ID group has same values in that particular column. Can't we do that together? Do I have to follow separate function?