tokenising a list of strings

Asked May 17 '20 at 03:31

Active Sep 06 '22 at 06:19

Viewed 31 times

I am trying to tokenise a list of strings to be a list of words. For example:

a=['NEWS FLASH: popcorn-flavored Tic-Tacs', 'The way']

I would like the output to be:

a=['NEWS', 'FLASH:', 'popcorn-flavored', 'Tic-Tacs', 'The', 'way']

I tried this code

from nltk.tokenize import word_tokenize
tokenized = [word_tokenize(i) for i in a]

but it returns a single list for each sentence

[['NEWS', 'FLASH', ':', 'popcorn-flavored', 'Tic-Tacs'], ['The', 'way']]

asked May 17 '20 at 03:31

leena

2

Then flatten the lists (that might not be the best solution, but its a workaround). – xilpex May 17 '20 at 03:37
3

`tokenized = word_tokenize(' '.join(a))` – deadshot May 17 '20 at 03:42

0 Answers0