0

I'm working on a file text, but, as it has spaces at the beginning too, when I try to delete my \n using the strip mode and list comprehension, I get a list with empty elements (" ") and I don't know how to delete them. I have a text and my code is:

with open(filename) as f:
    testo= f.readlines()
[e.strip() for e in testo]

but I get a list like this:

[' ', ' ', 'word1', 'word2', 'word3', ' ']

I wanted to know if I can work it out with the strip method, otherwise with another method.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
ChiG
  • 19
  • 1
  • 3

3 Answers3

1

You are getting those empty string because few of lines were just empty line breaks. Here's the code for weeding out these empty strings.

with open(filename) as f:
    testo = [e.strip() for e in f.readlines()]
    final_list = list(filter(lambda x: x != '', testo))
    print(final_list)

Without lambda and using map:

with open(filename) as f:
    final_list = list(filter(bool, map(str.strip, f)))
    print(final_list)

Another solution is:

with open(filename) as f:
 testo =  [x for x in f.read().splitlines() if x]
 print(testo)

For second solution is source is: https://stackoverflow.com/a/15233379/2988776

For performance upgrades refer to @Patrick 's answer

yivi
  • 42,438
  • 18
  • 116
  • 138
Ravinder Payal
  • 2,884
  • 31
  • 40
  • @JoachimIsaksson yeah I forgot add line break after testo = f.readlines(), I edited the code now. – Ravinder Payal Nov 18 '18 at 08:31
  • @PatrickArtner this twice list building is happening in your answer as well, difference is that here I am doing in two steps and you are doing inline. Check what you are doing inside parenthesis in this code: `[x for x in (line.strip() for line in f.readlines()) if x]` – Ravinder Payal Nov 18 '18 at 08:42
  • You don't need to use `lambda x: x != ''`, because it's equivalent to `bool` on `str` instances. And `filter`/`map` work faster with builtin functions. – Eli Korvigo Nov 18 '18 at 08:43
  • @EliKorvigo same as Patrick said: `Ah docs.python.org/3/library/functions.html#bool .. did not know that one` Now adding sample for that as well. – Ravinder Payal Nov 18 '18 at 08:45
  • Not to be a downer, but the splitlines solution (if `t` is changed to `f` so it can run) will remove line feeds correctly but still return empty lines. – Joachim Isaksson Nov 18 '18 at 08:51
  • @RavinderPayal if you do `[x for x in (line.strip() for line in f) if x]` the list is only build _once_ because the inner part is a _generator_ not a list comp. using `[x for x in [line.strip() for line in f] if x]` builds two lists – Patrick Artner Nov 18 '18 at 08:53
  • @PatrickArtner ahh I didn't the internals of this. If you are true I agree. – Ravinder Payal Nov 18 '18 at 08:54
  • Your "better solution" is far worse performace-wise: you read the entire file into memory, then you create a list via `str.splitlines` and, finally, you create another list via the listcomp. You've increased the memory footprint and the number of iterations by a factor of 3. – Eli Korvigo Nov 18 '18 at 09:42
1

You can use a generator to read all the lines and strip() the unwanted newlines.

From the generator you only use those elements that are "Truthy" - empty strings are considered False.

Advantage: you create only one list and get rid of empty strings:

Write file:

filename = "t.txt"
with open(filename,"w") as f:
    f.write("""

  c
  oo
  l

  te
  xt
  """)

Process file:

with open(filename) as f:
    testo = [x for x in (line.strip() for line in f) if x] # f.readlines() not needed. f is
                                                          # an iterable in its own right

print(testo)  # ['c', 'oo', 'l', 'te', 'xt']

You could do the similarly:

testo = [line.strip() for line in f if line.strip()]

but that would execute strip() twice and would be slightly less efficient.

Output:

['c', 'oo', 'l', 'te', 'xt']

Doku:


A suggested alternative from Eli Korvigo is:

testo = list(filter(bool, map(str.strip, f)))

with is essentially the same - replacing the explicit list comp using a generator comp with a map of str.strip on f (resulting in a generator) and applying a filter to that to feed it into a list.

See built in function for the docu of filter,map,bool.

I like mine better though ;o)

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • 1
    Or you can run `list(filter(bool, map(str.strip, lines)))` to avoid double `strip` calls. I also believe it would be nice to point out, that there is no need in calling `readlines` in the first place: the OP can iterate over the file handle directly. – Eli Korvigo Nov 18 '18 at 08:38
  • Ah https://docs.python.org/3/library/functions.html#bool .. did not know that one – Patrick Artner Nov 18 '18 at 08:41
  • 1
    @EliKorvigo added your suggestion , ping me if you want to create an alternate answer then I remove it again. Comments can vanish, so I'd rather preserve good ones in the answer. – Patrick Artner Nov 18 '18 at 08:48
  • The only thing I'd add is a cleaner way of writing this code. Since listcomps and `map`/`filter` higher-order function are all borrowed from functional languages, nesting them in parentheses is not pretty. If you install package `fn` (e.g. via pip or conda), you can do better: `with open(...) as lines: testo = (F(map, str.strip) >> (filter, bool) >> list)(lines)` (this assumes an import: `from fn import F`) – Eli Korvigo Nov 18 '18 at 09:39
0

From the data you showed us, it looks like there is a line with just a space in it. With that in mind, you have to decide whether this is something you want or not.

In case you want it, then your code should look something like this:

with open(filename) as f:
   testo=f.readlines()
list(filter(None, (l.rstrip('\n') for l in testo)))

In case you don't want lines with just whitespace characters, you can do something like:

with open(filename) as f:
   testo=f.readlines()
[e.rstrip('\n') for e in testo if e.strip()]

In this case, we avoid stripping the: " a word with leading and trailing spaces " to "a word with leading and trailing spaces", since in some cases it might change the semantics of the line:)

VePe
  • 1,291
  • 1
  • 7
  • 6