2

I have a dataframe that looks like this:

Value     group_id
23        1
28        1
32        1 
....
35        12
23        12
42        12

There are 6 unique group_id in total, each with few hundred elements. I want to convert this into a dataframe that has the value for a single 'group_id' in one single row. So I want my dataframe to look like this:

value_1 value_2 value_3 group_id
23       28      32      1
....
35       23      42      12

I tried using pandas.melt and pandas.groupby but failed to get any results.

jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • Possible duplicate of [Pandas long to wide reshape, by two variables](https://stackoverflow.com/questions/22798934/pandas-long-to-wide-reshape-by-two-variables) – andrew_reece Aug 07 '18 at 11:57

2 Answers2

3

You are missing the the position of the 'Value' within each group. We can create it with groupby.cumcount

df.set_index(
    ['group_id', df.groupby('group_id').cumcount() + 1]
).Value.unstack().add_prefix('Value_').reset_index()

   group_id  Value_1  Value_2  Value_3
0         1       23       28       32
1        12       35       23       42
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

Use groupby with cumcount to fill in your 'value_' records, and then use .pivot_table on the result:

df['val_grp'] = 'value_' + (df.groupby('group_id').cumcount() + 1).astype(str)

pvt = df.pivot_table(index='group_id', columns='val_grp', values='Value')

print(pvt)

Prints:

val_grp   value_1  value_2  value_3
group_id                           
1              23       28       32
12             35       23       42

As mentioned in the comments by andrew_reece, this can be done in a single line by doing:

pvt = df.assign(idx=(df.groupby("group_id").cumcount()+1).astype(str).str.replace("^(.)", "value_\\1", regex=True)).pivot(index="group_id", columns="idx", values="Value")

However, it should be noted that this requires Pandas 23.0 or higher, as this is when the regex parameter for str.replace was introduced.

asongtoruin
  • 9,794
  • 3
  • 36
  • 47
  • 1
    You can use `assign` to get it all on one chain: `df.assign(idx=df.groupby("group_id").cumcount().astype(str).str.replace("^(.)", "value_\\1", regex=True)).pivot(index="group_id", columns="idx", values="Value")` – andrew_reece Aug 07 '18 at 11:54
  • 1
    I like this answer and I agree with @andrew_reece. – piRSquared Aug 07 '18 at 12:29
  • @andrew_reece a monster one-liner! I've edited it into my answer (adding one to `.cumcount` to be consistent with OP's column names), thankyou very much! – asongtoruin Aug 07 '18 at 13:23
  • For some reason, this did not work with my dataset as the output using this was not in proper order and all messed up. However @piRSquared solution worked flawlessly – Bikram Baruah Aug 07 '18 at 13:27