0

I'm finding an efficient to reshape a N*M dataframe to 1*(N*M) dataframe:

INPUT>

df1

ID   distUnit col_a   col_b
1000   150      35     55
1000   250      10     20
1200   150      12     13
1200   250      16     20

DESIRED OUTPUT>

ID   col_a_150   col_b_150  col_a_250 col_b_250
1000   35            55        10          20
1200   12            13        16          20

My idea>

  1. Go through every row in df1
  2. add prefix to col_a and col_b based on the value of row['distUnit']
  3. using combined_first to add processed row back to result dataframe

Challenging part >

Since the size of my input data is 14440 * 20, my idea is not efficient enough.

Wondering any better implementation ways to solve this?

Thanks for reading.

tevemadar
  • 12,389
  • 3
  • 21
  • 49
lamiren
  • 65
  • 6
  • 1
    Split this into two DFs according to `distUnit`, then merge the two keyed by ID. That should give you a one-line operation, rather than iterating over the data frame. – Prune Mar 20 '20 at 23:12
  • Thanks for your idea. Should avoid iterating and keep separation idea in the mind – lamiren Mar 21 '20 at 01:42

1 Answers1

2

If pair (ID, distUnit) is unique across your dataset, you can simply "unmelt" your dataframe like this:

df=df.groupby(['ID','distUnit'])['col_a','col_b'].mean().unstack()
df.columns =  [f'{col[0]}_{col[1]}' for col in df.columns.values]

Check this question for similar approaches.

NRJ
  • 148
  • 2
  • 8