3

I have a dataframe that looks like this.

import pandas as pd

data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
        'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
        'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
        'transaction':[1,1,1,1,1,2,2,2,2]}

df = pd.DataFrame(data)
df

enter image description here

If I try a basic melt, like so...

df_melted = df.pivot_table(index='clean_words', columns='transaction')
df_melted.tail()

I get this...

enter image description here

What I really want is the transaction number as columns and the words listed down. So, if transaction1 was the column, these words would be listed in rows, under that column:

`'good','evening','how','are','you'`

Under transaction2, these words would be listed in rows, under that column:

'how','can','i','help'

How can I do that? The start_time and end_time are kind of superfluous here.

ASH
  • 20,759
  • 19
  • 87
  • 200
  • 1
    `df.pivot_table(index=df.groupby('transaction').transform('cumcount'), columns='transaction', values='clean_words', aggfunc=''.join)` https://i.stack.imgur.com/J6yaG.png – tdy Mar 30 '23 at 21:11
  • (BTW the title/tags mention melting, but this is actually pivoting) – tdy Mar 30 '23 at 21:19
  • 1
    Wow! Amazing! Can you explain how this works? I think it kind of makes sense, but not the 'cumcount' part. I know cumulative count, but what is it actually doing in this scenario? Thanks!! – ASH Mar 30 '23 at 22:33
  • 1
    Note how group 2 originally has indexes 5,6,7,8. When we pivot the transactions into columns, we instead want group 2's indexes to restart at 0,1,2,3. We use `index=df.groupby('transaction').cumcount()` to create this resetting index: group 1 has 0,1,2,3,4 and group 2 has 0,1,2,3. – tdy Mar 30 '23 at 22:56
  • (I just noticed `transform` is not needed in my original comment) – tdy Mar 30 '23 at 22:56

2 Answers2

1

Is this the format you want?

>>> pd.DataFrame({'1': ['good', 'evening', 'how', 'are', 'you'], '2': ['how', 'can', 'I', 'help', None]})
         1     2
0     good   how
1  evening   can
2      how     I
3      are  help
4      you  None

I haven't done that before but you could pivot your data and collect a list of words under each transaction column.

>>> df.pivot_table(columns='transaction', values='clean_words', aggfunc=list)
transaction                               1                    2
clean_words  [good, evening, how, are, you]  [how, can, i, help]

Or group by transaction and collect a list of words.

>>> df.groupby('transaction', as_index=False).agg(clean_words=pd.NamedAgg(column='clean_words', aggfunc=list))
   transaction                     clean_words
0            1  [good, evening, how, are, you]
1            2             [how, can, i, help]
1
import pandas as pd
import numpy as np

data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
        'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
        'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
        'transaction':[1,1,1,1,1,2,2,2,2]}

df = pd.DataFrame(data)

df_melted = df.groupby('transaction')['clean_words'].apply(np.array).reset_index()

print(df_melted)

transaction                     clean_words
0            1  [good, evening, how, are, you]
1            2             [how, can, i, help]
mks2192
  • 306
  • 2
  • 11