I have the following pandas dataframe df:
Book_Category | Book_Title | Revenue
Thriller You don't know what I have done 200
Romance Last Summer I loved you 100
I am trying to find a way to create a new dataframe, by word in the Book Title (please note that lower and upper case should not matter)
This is the end goal df2:
Book_Title_word | Revenue
you 300
I 300
don't 200
know 200
what 200
have 200
done 200
last 100
summer 100
loved 100
Because the words I and you were in both titles, the revenue was summed for them.
Is this feasible in python?
Thank you very much
UPDATE:
Because I am using larger numbers, when using the revenue provided by A-Za-z is in scientific notation fromat ('2.155051e-01').
Book_Category | Book_Title | Revenue | Quantity
A ...what ... 3459283 45757
B what ... 4376899 35657
C .....what 4567856 7689
df_new = pd.DataFrame(df['Book_Title'].str.split(' ').tolist(), index=df['Revenue']).stack().reset_index()[[0, 'Revenue']]
df_new.columns = ['Book_Title_word', 'Revenue']
df_new.Book_Title_word = df_new.Book_Title_word.str.lower()
df_new.groupby('Book_Title_word').sum().sort_values(by = 'Revenue',ascending = False)
Book_Title_word | Revenue
what 2.160651e-01
This fixed the issue
pd.set_option('display.float_format', lambda x: '%.3f' % x)
from this answer Format / Suppress Scientific Notation from Python Pandas Aggregation Results