Merge multiple rows of the same ID into one row while creating new columns in Pandas

Question

Suppose there is a dataframe like this:

Something_ID	Description	Col1	Col2
id1	Desc1	1.1	0.1
id1	Desc2	1.1	2.1
id1	Desc3	1.1	3.1
id2	Desc1	8.1	1.1
id2	Desc4	8.1	5.1
id2	Desc2	8.1	2.1

I want to compress this dataframe into one ID one row format while creating new columns if there are different values in a column for the same ID. The output dataframe should be like this:

Something_ID	Col1	Desc1_Col2	Desc2_Col2	Desc3_Col2	Desc4_Col2
id1	1.1	0.1	2.1	3.1	NaN
id2	8.1	1.1	2.1	NaN	5.1

New column names are created using 'Description' column. If each ID group has same values in a column, we keep that column without modification but keep only one record.

The first idea I got was iterating through columns. But the actual scenario has many columns and rows and that approach might take long time because it has many steps.

Since I am new to Pandas, I don't have enough knowledge on how to apply its functions. Also I could not find any helpful links related to this.

So I hope someone can help me to solve this.

Thank you in advance!

---UPDATE---

I found a helpful approach in this question: How make a dataframe in wide format creating new column names from the existing ones?

It solves part of my problem. But my other requirement is to keep the column as it is, if the ID group has same values in that particular column. Can't we do that together? Do I have to follow separate function?

Try something like that: `out = df.pivot(index=['Something_ID', 'Col1'], columns='Description', values='Col2')`. You know the rest. — Corralien, Jul 15 '21 at 13:34
@Corralien Is this possible if that dataframe has not only Col1 and Col2 but multiple columns ? — Gaya3, Jul 15 '21 at 13:41
Hi! Did your query solved? if so then try considering [accepting](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work/5235#5235) to signal others that the issue is resolved. If not, you can provide feedback so that the answer can be improved (or removed) — Anurag Dabas, Aug 14 '21 at 06:53

Anurag Dabas · Answer 1 · 2021-07-15T14:13:43.880

1

selection part:

cols=df.filter(like='Col').columns
exclude=['Col1','Col4']
#These are the columns that you want to kept like Col1

Calculation part:

out=df.pivot_table(cols,'Something_ID','Description').swaplevel(axis=1)
includecols=~out.columns.get_level_values(1).isin(exclude)
newdf=out.loc[:,~includecols].droplevel(0,1).groupby(level=0,axis=1).first()
out=out.loc[:,includecols]
out.columns=out.columns.map('_'.join)
out=newdf.join(out).reset_index()

output of out:

  Something_ID      Col1    Desc1_Col2  Desc2_Col2  Desc3_Col2  Desc4_Col2
0   id1             1.1     0.1             2.1     3.1         NaN
1   id2             8.1     1.1             2.1     NaN         5.1

Sample Dataframe used by me:

df=pd.DataFrame({'Something_ID': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id2', 4: 'id2', 5: 'id2'},
 'Description': {0: 'Desc1',
  1: 'Desc2',
  2: 'Desc3',
  3: 'Desc1',
  4: 'Desc4',
  5: 'Desc2'},
 'Col1': {0: 1.1, 1: 1.1, 2: 1.1, 3: 8.1, 4: 8.1, 5: 8.1},
 'Col2': {0: 0.1, 1: 2.1, 2: 3.1, 3: 1.1, 4: 5.1, 5: 2.1},
 'Col3': {0: 0.3, 1: 6.3, 2: 9.3, 3: 3.3, 4: 15.3, 5: 6.3},
 'Col4': {0: 0.6, 1: 12.6, 2: 18.6, 3: 6.6, 4: 30.6, 5: 12.6}})

edited Jul 15 '21 at 14:13

answered Jul 15 '21 at 13:20

Anurag Dabas

23,866
9
21
41

Thank you so much for the quick response! This is helpful for my work. But what if that dataframe has multiple columns which have similar formats like 'col1' and 'col2'? (E.g There are 100 columns, some are like col1 and others are like col2) – Gaya3 Jul 15 '21 at 13:38
@Gaya3 so you want only Col1 and Col2? – Anurag Dabas Jul 15 '21 at 13:40
No no.. the actual scenario has many columns like col1 and col2. So I have to process all of them. – Gaya3 Jul 15 '21 at 13:42
@Corralien no friend actually I posted before jezrael Sir closed the question then I delete this answer bcz it marked as duplicate but after I realize it is not exact same then I undeleted it – Anurag Dabas Jul 15 '21 at 13:53
1

I'm agree. Can you vote to reopen the question please? – Corralien Jul 15 '21 at 13:55
@Corralien sure **:)** – Anurag Dabas Jul 15 '21 at 14:05
@Gaya3 updated answer...kindly have a look **:)** – Anurag Dabas Jul 15 '21 at 14:09

Merge multiple rows of the same ID into one row while creating new columns in Pandas

1 Answers1

Linked