2

Description:

I have a pandas dataframe that contains two columns ID and Value.

I want to group the the ID column

and convert the group results (Value) into multiple columns with Numeric suffix as Value1, Value2, Value3 and so on based on the total results.

Example:

Current DataFrame:

df = pd.DataFrame({'ID': ['A', 'A', 'A', 'B', 'C', 'C', 'D', 'E', 'F', 'F', 'F', 'F'], 'Value': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'] } )
df
   ID Value
0   A     A
1   A     B
2   A     C
3   B     D
4   C     E
5   C     F
6   D     G
7   E     H
8   F     I
9   F     J
10  F     K
11  F     L

Expected DataFrame:

  ID  Value1  Value2  Value3  Value4
0  A       A     B       C       NaN
1  B       D     NaN     NaN     NaN
2  C       E     F       NaN     NaN
3  D       G     NaN     NaN     NaN
4  E       H     NaN     NaN     NaN
5  F       I     J       K       L

I have tried multiple solutions with pivot table as well, but I don't get the results.

What I tried

pd.pivot(df, index='ID', columns='ID', values='Value')
High-Octane
  • 1,104
  • 5
  • 19
  • See "question 10" in the pivot question: https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe – cs95 Dec 18 '20 at 23:01
  • 2
    To fix your pivot code use `df.assign(count=df.groupby('ID').cumcount()).pivot('ID', 'count', 'Value')` – cs95 Dec 18 '20 at 23:02
  • 1
    If you want a solution without `pivot` you can try this one-liner: `df.groupby(["ID"])["Value"].unique().apply(pd.Series).filter(regex="\d").add_prefix('Value').reset_index()` – Fourier Dec 18 '20 at 23:19

2 Answers2

4
df=pd.pivot_table(df, index='ID', columns=df.groupby("ID").cumcount())
df.columns = df.columns.map('{0[0]}_{0[1]}'.format)
df
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • This gives me an issue saying that `pandas.core.base.DataError: No numeric types to aggregate`. I think it happens since the values are not numeric. – High-Octane Dec 18 '20 at 23:06
  • Worked for me with the dataset you gave and should work for you. Check that your value column is not of 'object' type? the groupby operations specifically check whether each column is a numeric dtype first. – wwnde Dec 18 '20 at 23:11
  • My dtypes are set as object as of now. – High-Octane Dec 18 '20 at 23:12
  • 1
    Those cant be aggregated then. You can `df[Value]=df[Value].astype(int)` if you want to use this method. – wwnde Dec 18 '20 at 23:13
  • Okay, I will check that out as well. Thanks a lot for the inputs. Much appreciated! – High-Octane Dec 18 '20 at 23:14
3

You can create a suitable MultiIndex with groupby.cumcount and unstack

df.set_index(['ID', df.groupby('ID').cumcount()]).unstack().add_prefix('Value').droplevel(0, 1).reset_index()

Out:

  ID Value0 Value1 Value2 Value3
0  A      A      B      C    NaN
1  B      D    NaN    NaN    NaN
2  C      E      F    NaN    NaN
3  D      G    NaN    NaN    NaN
4  E      H    NaN    NaN    NaN
5  F      I      J      K      L
Michael Szczesny
  • 4,911
  • 5
  • 15
  • 32
  • Wow, This is incredible. Thanks a lot for the answer, I definitely need to have a look at MultiIndex, cumcount and unstack. – High-Octane Dec 18 '20 at 23:05