14

So stdin returns a string of text into a list, and multiple lines of text are all list elements. How do you split them all into single words?

mylist = ['this is a string of text \n', 'this is a different string of text \n', 'and for good measure here is another one \n']

wanted output:

newlist = ['this', 'is', 'a', 'string', 'of', 'text', 'this', 'is', 'a', 'different', 'string', 'of', 'text', 'and', 'for', 'good', 'measure', 'here', 'is', 'another', 'one']
Georgy
  • 12,464
  • 7
  • 65
  • 73
iFunction
  • 1,208
  • 5
  • 21
  • 35
  • Related: [How to split strings inside a list by given delimiter and flatten the sub-strings lists](https://stackoverflow.com/q/41700349/7851470). – Georgy Dec 14 '20 at 13:29

4 Answers4

23

You can use simple list comprehension, like:

newlist = [word for line in mylist for word in line.split()]

This generates:

>>> [word for line in mylist for word in line.split()]
['this', 'is', 'a', 'string', 'of', 'text', 'this', 'is', 'a', 'different', 'string', 'of', 'text', 'and', 'for', 'good', 'measure', 'here', 'is', 'another', 'one']
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
6

You could just do:

words = str(list).split()

So you turn the list into a string then split it by a space bar. Then you can remove the /n's by doing:

words.replace("/n", "")

Or if you want to do it in one line:

words = str(str(str(list).split()).replace("/n", "")).split()

Just saying this may not work in python 2

SollyBunny
  • 800
  • 1
  • 8
  • 15
3

Besides the list comprehension answer above that i vouch for, you could also do it in a for loop:

#Define the newlist as an empty list
newlist = list()
#Iterate over mylist items
for item in mylist:
 #split the element string into a list of words
 itemWords = item.split()
 #extend newlist to include all itemWords
 newlist.extend(itemWords)
print(newlist)

eventually your newlist will contain all split words that were in all elements in mylist

But the python list comprehension looks much nicer and you can do awesome things with it. Check here for more:

https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

Ouss
  • 2,912
  • 2
  • 25
  • 45
  • Yes, thanks for putting me onto this, I was studying it all weekend. It is a nice and elegant way to solve the problem. My main concern is speed and efficiency, and it seems to me that list comprehensions being part of the built in python language would be quicker than a loop. – iFunction May 22 '17 at 09:00
2

Alternatively, you can map str.split method to every string inside the list and then chain the elements from the resulting lists together by itertools.chain.from_iterable:

from itertools import chain

mylist = ['this is a string of text \n', 'this is a different string of text \n', 'and for good measure here is another one \n']
result = list(chain.from_iterable(map(str.split, mylist)))
print(result)
# ['this', 'is', 'a', 'string', 'of', 'text', 'this', 'is', 'a', 'different', 'string', 'of', 'text', 'and', 'for', 'good', 'measure', 'here', 'is', 'another', 'one']
Georgy
  • 12,464
  • 7
  • 65
  • 73