How can I extract sentences from paragraphs in a pandas.DataFrame and keep the paragraph key?

Question

I have a pandas.DataFrame with 1604 paragraphs as follows:

I want to extract all the sentences (even in a NAIVE way using dots) and provide a new data frame which has in each row one sentence and the previous column values especially the paragraph key (mainly index in the first column in the left)

I have worked on that and could provide the chapter column for each sentence as follows:

 # Create lists to fill with values
l_col1 = []
l_col2 = []

# iterate over each row and fill our lists
for ix, row in dfAstroNova.iterrows():
    for value in row['sentences']:
        l_col1.append(value)
        l_col2.append(row['chapter'])

# Create new dataframe from the two lists
df= pd.DataFrame({'sentences': l_col1 ,
                         'chapter': l_col2 })
df=df.rename(columns={"sentences":"sents"});

which gives me this data frame(dfAstroNova is the name of the original data frame)

as you see I have the chapter key. My question is how to add paragraph key (which is the number of column text in main data frame to a new data frame)

Then I have one other column which shows that this sentence belong to which paragraph in the original data frame or better one additional column which includes for each sentence the corresponded paragraph?

[Here](https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe) is a good solution to a similar problem. — PSK, Aug 25 '20 at 11:01

score 0 · Answer 1 · edited Dec 18 '19 at 17:26

I have done it in this way

 # Create lists to fill with values
l_col1 = []
l_col2 = []
l_col3 = []

# iterate over each row and fill our lists
for ix, row in dfAstroNova.iterrows():
    for value in row['sentences']:
        l_col1.append(value)
        l_col2.append(row['chapter'])
        l_col3.append(row['text'])

# Create new dataframe from the two lists
df= pd.DataFrame({'sentences': l_col1 ,
                         'chapter': l_col2 , 'paragraph': l_col3})
df=df.rename(columns={"sentences":"sents"});

This is the result:

Now, I only need to define a key for the paragraph and add it to the new table!

How can I extract sentences from paragraphs in a pandas.DataFrame and keep the paragraph key?

1 Answers1