I have a pandas.DataFrame
with 1604 paragraphs as follows:
I want to extract all the sentences (even in a NAIVE way using dots) and provide a new data frame which has in each row one sentence and the previous column values especially the paragraph key (mainly index in the first column in the left)
I have worked on that and could provide the chapter column for each sentence as follows:
# Create lists to fill with values
l_col1 = []
l_col2 = []
# iterate over each row and fill our lists
for ix, row in dfAstroNova.iterrows():
for value in row['sentences']:
l_col1.append(value)
l_col2.append(row['chapter'])
# Create new dataframe from the two lists
df= pd.DataFrame({'sentences': l_col1 ,
'chapter': l_col2 })
df=df.rename(columns={"sentences":"sents"});
which gives me this data frame(dfAstroNova is the name of the original data frame)
as you see I have the chapter key. My question is how to add paragraph key (which is the number of column text in main data frame to a new data frame)
Then I have one other column which shows that this sentence belong to which paragraph in the original data frame or better one additional column which includes for each sentence the corresponded paragraph?