I have an existing dataframe (coffee_directions_df) that looks like the following
coffee_directions_df
Utterance Frequency
Directions to Starbucks 1045
Directions to Tullys 1034
Give me directions to Tullys 986
Directions to Seattles Best 875
Show me directions to Dunkin 812
Directions to Daily Dozen 789
Show me directions to Starbucks 754
Give me directions to Dunkin 612
Navigate me to Seattles Best 498
Display navigation to Starbucks 376
Direct me to Starbucks 201
The DF shows utterances made by people and the frequency of utterances.
I.e., "Directions to Starbucks" was uttered 1045 times.
I am trying to figure out how to replace similar words, such as "Starbucks", "Tullys", "Seattles Best" in the column coffee_directions_df.Utterance
to one string, such as "Coffee". I've seen similar answers that suggest a dictionary, such as the following, but I haven't had success yet.
{'Utterance':['Starbucks','Tullys','Seattles Best'],
'Combi_Utterance':['Coffee','Coffee','Coffee','Coffee']}
{'Utterance':['Dunkin','Daily Dozen'],
'Combi_Utterance':['Donut','Donut']}
{'Utterance':['Give me','Show me','Navigate me','Direct me'],
'Combi_Utterance':['V_me','V_me','V_me','V_me']}
The desired output is as the following:
coffee_directions_df
Utterance Frequency Combi_Utterance
Directions to Starbucks 1045 Directions to Coffee
Directions to Tullys 1034 Directions to Coffee
Give me directions to Tullys 986 V_me to Coffee
Directions to Seattles Best 875 Directions to Coffee
Show me directions to Dunkin 812 V_me to Donut
Directions to Daily Dozen 789 Directions to Donut
Show me directions to Starbucks 754 V_me to Coffee
Give me directions to Dunkin 612 V_me to Donut
Navigate me to Seattles Best 498 V_me to Coffee
Display navigation to Starbucks 376 Display navigation to Coffee
Direct me to Starbucks 201 V_me to Coffee
Ultimately, I want to be able to use this code that I have to produce the final output.
df = (df.set_index('Frequency')['Utterance']
.str.split(expand=True)
.stack()
.reset_index(name='Words')
.groupby('Words', as_index=False)['Frequency'].sum()
)
print (df)
Words Frequency
0 Directions 6907
1 V_me 3863
2 Donut 2213
3 Coffee 5769
4 Other 376
Thanks!!