Convert list of sentences to list of word tokens

Asked Jun 03 '18 at 17:12

Active Jun 04 '18 at 12:28

Viewed 24 times

I have a list as below

data_corpus = ["John likes to watch movies",
 "Mary likes movies too", 
"John also likes to watch football games"]

I want to get

['John', 'likes', 'to', 'watch', 'movies', 'Mary', 'likes', 'movies', 'too',
 'John', 'also', 'likes', 'to', 'watch', 'football', 'games']

I do

from nltk.tokenize import word_tokenize
tokenized = [word_tokenize(i) for i in data_corpus]
tokenized

ang get list of sentences instead of list of words

[['John', 'likes', 'to', 'watch', 'movies'],
 ['Mary', 'likes', 'movies', 'too'],
 ['John', 'also', 'likes', 'to', 'watch', 'football', 'games']]

How to fix it?

edited Jun 04 '18 at 12:28

asked Jun 03 '18 at 17:12

Edward

1

In your case an option could be `list(chain.from_iterable(map(word_tokenize, data_corpus)))`. – miradulo Jun 03 '18 at 17:19
1

Try `word_tokenize('\n'.join(data_corpus))` – alvas Jun 03 '18 at 21:48

0 Answers0