7

I'm trying to essentially take a list of strings containg sentences such as:

sentence = ['Here is an example of what I am working with', 'But I need to change the format', 'to something more useable']

and convert it into the following:

word_list = ['Here', 'is', 'an', 'example', 'of', 'what', 'I', 'am',
'working', 'with', 'But', 'I', 'need', 'to', 'change', 'the format',
'to', 'something', 'more', 'useable']

I tried using this:

for item in sentence:
    for word in item:
        word_list.append(word)

I thought it would take each string and append each item of that string to word_list, however the output is something along the lines of:

word_list = ['H', 'e', 'r', 'e', ' ', 'i', 's' .....etc]

I know I am making a stupid mistake but I can't figure out why, can anyone help?

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
George Burrows
  • 3,391
  • 9
  • 31
  • 31

5 Answers5

19

You need str.split() to split each string into words:

word_list = [word for line in sentence for word in line.split()]
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • Thanks again, I knew I was missing something easy like that, much appreciated! – George Burrows Dec 12 '11 at 17:56
  • This should be `[word for line in sentence for word in line.split()]`. – Andrew Clark Dec 12 '11 at 18:18
  • 1
    Upvoted, though keep in mine more than 2 clauses of iteration is generally frowned upon in list comprehensions. –  Dec 12 '11 at 18:21
  • I know this has been sometime but can I get explanation on the code? I understand `[line for line in sentence]` but I dont understand the second half `for word in line.split()`. How is it different from `[line.split() for line in sentence]` ? – addicted Feb 22 '18 at 11:27
  • One important syntax to learn here is in single list notation with multiple for loops, thanks .. – shantanu pathak Feb 08 '19 at 13:31
7

Just .split and .join:

word_list = ' '.join(sentence).split(' ')
Blender
  • 289,723
  • 53
  • 439
  • 496
4

You haven't told it how to distinguish a word. By default, iterating through a string simply iterates through the characters.

You can use .split(' ') to split a string by spaces. So this would work:

for item in sentence:
    for word in item.split(' '):
        word_list.append(word)
Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
2
for item in sentence:
    for word in item.split():
        word_list.append(word)
Emil Vikström
  • 90,431
  • 16
  • 141
  • 175
-1

Split sentence into words:

print(sentence.rsplit())
Isma
  • 14,604
  • 5
  • 37
  • 51