Split list of paragraphs at punkt (".")

Question

I have a list of paragraphs:

paragraphs = ['I do not like green eggs and ham. I am hungry, but I do not find anything to eat', '5.2. I do not like them Sam-I-am. I am Sam.', 'Blah, Blah, Blah']

I would like to separate this paragraphs at the punkt (".") and get a list with each sentence and wrote this code therefore:

sentences = []
for paragraph in paragraphs:
  sentence = nltk.tokenize.sent_tokenize(paragraph)
  sentences.append(sentence)

I got a list of lists:

sentences = [['I do not like green eggs and ham.', 'I am hungry, but I do not find anything to eat'], ['5.2.', 'I do not like them Sam-I-am.', 'I am Sam.'], ['Blah, Blah, Blah']]

Instead I would like to get:

sentences = ['I do not like green eggs and ham.', 'I am hungry, but I do not find anything to eat', '5.2.', 'I do not like them Sam-I-am.', 'I am Sam.', 'Blah, Blah, Blah']

How can I get this?

Does this answer your question? [How to make a flat list out of list of lists?](https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists) — gre_gor, Jun 10 '20 at 18:43

score 1 · Accepted Answer · answered Jun 10 '20 at 18:47

In your code variable sentence is a list of strings by itself. You could fix that by appending each element of sentence to sentences.

sentences = []
for paragraph in paragraphs:
  sentence = nltk.tokenize.sent_tokenize(paragraph)
  for i in sentence:
    sentences.append(i)

Split list of paragraphs at punkt (".")

1 Answers1