0

I have the following df

df = pd.DataFrame({'doc':['john','john','john', 'mary', 'mary', 'mary'], 'token':[1,2,3,4,5,6,]})

How do I turn it into:

df = pd.DataFrame({'john':[1,2,3],'mary':[4,5,6]})

I've tried pivot, pivot_table, stack, and unstack but had no success.

2 Answers2

3

Use groupby to create a dummy index then use pivot to get the expected dataframe:

>>> (df.assign(index=df.groupby('doc').cumcount())
       .pivot(index='index', columns='doc', values='token')
       .rename_axis(index=None, columns=None))

   john  mary
0     1     4
1     2     5
2     3     6

Update: Suggested by @Chrysophylaxs using pivot_table:

>>> (df.pivot_table(columns="doc", index=df.groupby("doc").cumcount(), values="token")
       .rename_axis(columns=None))

   john  mary
0     1     4
1     2     5
2     3     6
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

You could do this:

df_cols = pd.DataFrame({k: v.reset_index(drop=True) for k, v in df.groupby('doc')['token']})

Output:

   john  mary
0     1     4
1     2     5
2     3     6
Nick ODell
  • 15,465
  • 3
  • 32
  • 66