1

Suppose there is a dataframe like this:

Something_ID Description Col1 Col2
id1 Desc1 1.1 0.1
id1 Desc2 1.1 2.1
id1 Desc3 1.1 3.1
id2 Desc1 8.1 1.1
id2 Desc4 8.1 5.1
id2 Desc2 8.1 2.1

I want to compress this dataframe into one ID one row format while creating new columns if there are different values in a column for the same ID. The output dataframe should be like this:

Something_ID Col1 Desc1_Col2 Desc2_Col2 Desc3_Col2 Desc4_Col2
id1 1.1 0.1 2.1 3.1 NaN
id2 8.1 1.1 2.1 NaN 5.1

New column names are created using 'Description' column. If each ID group has same values in a column, we keep that column without modification but keep only one record.

The first idea I got was iterating through columns. But the actual scenario has many columns and rows and that approach might take long time because it has many steps.

Since I am new to Pandas, I don't have enough knowledge on how to apply its functions. Also I could not find any helpful links related to this.

So I hope someone can help me to solve this.

Thank you in advance!

---UPDATE---

I found a helpful approach in this question: How make a dataframe in wide format creating new column names from the existing ones?

It solves part of my problem. But my other requirement is to keep the column as it is, if the ID group has same values in that particular column. Can't we do that together? Do I have to follow separate function?

Gaya3
  • 155
  • 2
  • 16
  • 1
    Try something like that: `out = df.pivot(index=['Something_ID', 'Col1'], columns='Description', values='Col2')`. You know the rest. – Corralien Jul 15 '21 at 13:34
  • @Corralien Is this possible if that dataframe has not only Col1 and Col2 but multiple columns ? – Gaya3 Jul 15 '21 at 13:41
  • 1
    Yes. columns to protect should go to `index` – Corralien Jul 15 '21 at 13:48
  • Hi! Did your query solved? if so then try considering [accepting](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work/5235#5235) to signal others that the issue is resolved. If not, you can provide feedback so that the answer can be improved (or removed) – Anurag Dabas Aug 14 '21 at 06:53

1 Answers1

1

selection part:

cols=df.filter(like='Col').columns
exclude=['Col1','Col4']
#These are the columns that you want to kept like Col1

Calculation part:

out=df.pivot_table(cols,'Something_ID','Description').swaplevel(axis=1)
includecols=~out.columns.get_level_values(1).isin(exclude)
newdf=out.loc[:,~includecols].droplevel(0,1).groupby(level=0,axis=1).first()
out=out.loc[:,includecols]
out.columns=out.columns.map('_'.join)
out=newdf.join(out).reset_index()

output of out:

  Something_ID      Col1    Desc1_Col2  Desc2_Col2  Desc3_Col2  Desc4_Col2
0   id1             1.1     0.1             2.1     3.1         NaN
1   id2             8.1     1.1             2.1     NaN         5.1

Sample Dataframe used by me:

df=pd.DataFrame({'Something_ID': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id2', 4: 'id2', 5: 'id2'},
 'Description': {0: 'Desc1',
  1: 'Desc2',
  2: 'Desc3',
  3: 'Desc1',
  4: 'Desc4',
  5: 'Desc2'},
 'Col1': {0: 1.1, 1: 1.1, 2: 1.1, 3: 8.1, 4: 8.1, 5: 8.1},
 'Col2': {0: 0.1, 1: 2.1, 2: 3.1, 3: 1.1, 4: 5.1, 5: 2.1},
 'Col3': {0: 0.3, 1: 6.3, 2: 9.3, 3: 3.3, 4: 15.3, 5: 6.3},
 'Col4': {0: 0.6, 1: 12.6, 2: 18.6, 3: 6.6, 4: 30.6, 5: 12.6}})
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41