0

I have a list of paragraphs:

paragraphs = ['I do not like green eggs and ham. I am hungry, but I do not find anything to eat', '5.2. I do not like them Sam-I-am. I am Sam.', 'Blah, Blah, Blah']

I would like to separate this paragraphs at the punkt (".") and get a list with each sentence and wrote this code therefore:

sentences = []
for paragraph in paragraphs:
  sentence = nltk.tokenize.sent_tokenize(paragraph)
  sentences.append(sentence)

I got a list of lists:

sentences = [['I do not like green eggs and ham.', 'I am hungry, but I do not find anything to eat'], ['5.2.', 'I do not like them Sam-I-am.', 'I am Sam.'], ['Blah, Blah, Blah']]

Instead I would like to get:

sentences = ['I do not like green eggs and ham.', 'I am hungry, but I do not find anything to eat', '5.2.', 'I do not like them Sam-I-am.', 'I am Sam.', 'Blah, Blah, Blah']

How can I get this?

Tobitor
  • 1,388
  • 1
  • 23
  • 58

1 Answers1

1

In your code variable sentence is a list of strings by itself. You could fix that by appending each element of sentence to sentences.

sentences = []
for paragraph in paragraphs:
  sentence = nltk.tokenize.sent_tokenize(paragraph)
  for i in sentence:
    sentences.append(i)
JST
  • 56
  • 2