0

I wanted to remove the spaces from my list element and separate them into different list elements. For example, if I have the list:

['Hello world', 'testing', 'testing two']

I'd want the list to look like:

['Hello', 'world', 'testing', 'testing', 'two']

The issue I'm having is that i am reading from a file and I already stripped the newline characters and when I tried to strip the spaces, it doesn't seem to work. Below is my code:

with open(fname, 'r') as f:
  words = [line.strip().strip(' ') for line in f]
print words

This just prints out what I mentioned previously above, with the list elements still having spaces.

If anyone could help me out, that'd be great! Thanks!

user1871869
  • 3,317
  • 13
  • 56
  • 106
  • possible duplicate of [returning a list of words after reading a file in python](http://stackoverflow.com/questions/13259288/returning-a-list-of-words-after-reading-a-file-in-python) – kojiro Oct 20 '13 at 01:53

5 Answers5

3

I would do something like this:

" ".join(list).split(" ")

That will join the list together and then split it apart. There are probably somewhat more efficient ways, but this way is simple.

Eric Pauley
  • 1,709
  • 1
  • 20
  • 30
2

split() splits on any white space by default, so you can do the whole file in one easy step.

words =  f.read().split()

If you want to avoid reading the whole file into memory with f.read():

words = [word for line in f for word in line.split()]
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
  • 1
    I thought about posting this as an answer ... It can have some problems for really big files, but generally these days that's probably not a concern. – mgilson Oct 20 '13 at 02:00
  • @mgilson: I thought about the large file issue as well, but figured that if he has enough memory to hold all the words individually, he probably has enough memory for the whole chunk. – Steven Rumbalski Oct 20 '13 at 03:24
1

.strip only removes stuff from the beginning or end of a string. What you want is to split the sting on whitespace:

lines_split = [line.split() for line in f]

This will give you a nested list which you can easily flatten. See for example this answer or this one.

My prefered approach here would be to write a simple generator to yield a word at a time. Then you can turn it into a list later if you need to:

def get_words(filename):
    with open(filename) as fin:
        for line in fin:
            for word in line.split():
                yield word

There's some magic you can do to condense this down with itertools, but this should suffice for now.

Community
  • 1
  • 1
mgilson
  • 300,191
  • 65
  • 633
  • 696
0

You are looking for the split method. The simplest way to do what you want looks like this:

words = []
with open(fname) as f:
  for line in f:
    words.extend(line.split())

and the slightly cleverer method looks like this:

import itertools
with open(fname) as f:
  words = list(itertools.chain.from_iterable(l.split() for l in f))

I don't know which is faster. Note that when called without a separator argument, split effectively does what strip does as well as splitting on interior whitespace, so you needn't bother calling strip first.

zwol
  • 135,547
  • 38
  • 252
  • 361
0

I like Zonedabone's answer. But here is another way:

>>> from itertools import chain
>>> l = ['Hello world', 'testing', 'testing two']
>>> result = list(chain.from_iterable(w.split() for w in l))
# ['Hello', 'world', 'testing', 'testing', 'two']
mshsayem
  • 17,557
  • 11
  • 61
  • 69
  • 1
    for what it's worth, `chain.from_iterable(w.split() for w in l)` is generally preferable to `chain(*[...])`. The latter pretty much gets rid of all of the advantage of using iterable objects in the first place. – mgilson Oct 20 '13 at 02:12