I am trying to split Comments column into multiple rows containing each sentence. I used the following StackOverflow thread for my reference as it tends to give similar result. Reference Link: pandas: How do I split text in a column into multiple rows? Sample data of dataframe is as below.
Id Team Food_Text 1 X Food is good. It is very well cooked. Delicious! 2 X I hate the squid. Food is not cooked well. At all indeed. 3 X Please do not good anytime over here 4 Y I love the fish. Awesome delicacy. 5 Y Good for desserts. Meat tastes bad
Each record for 'Food_Text' can be of multiple sentences delimited by full-stop or period. I have used the following code
import numpy as np
import pandas as pd
survey_data = pd.read_csv("Food_Dummy.csv")
survey_text = survey_data[['Id','Team','Food_Text']]
# Getting s as pandas series which has split on full stop and new sentence a new line
s = survey_text["Food_Text"].str.split('.').apply(pd.Series,1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
s.name = 'Food_Text' # needs a name to join
# There are blank or emplty cell values after above process. Removing them
s.replace('', np.nan, inplace=True)
s.dropna(inplace=True)
x=s.to_frame(name='Food_Text1')
x.head(10)
# Joining should ideally get me proper output. But I am getting original dataframe instead of split one.
survey_text.join(x)
survey_text.head(10)
I am not sure why the join is not giving me a proper dataframe with more number of rows. Repetition of other columns based on index of split. So Id=1 has 3 sentences so we should have 3 records with all other data same and Food_Text column with a new sentence from a comment by ID=1. Similarly for other records.
Thank You in advance for your help! Regards, Sohil Shah