50

I have large string which I split by newlines. How can I remove all lines that are empty, (whitespace only)?

pseudo code:

for stuff in largestring:
   remove stuff that is blank
ViFI
  • 971
  • 1
  • 11
  • 27
  • 3
    [For myself, I found the answer here is the best solution](http://stackoverflow.com/questions/1140958/whats-a-quick-one-liner-to-remove-empty-lines-from-a-python-string#answer-24172715) – Dmitriy Oct 05 '16 at 06:00
  • 2
    One liner to remove empty lines (without whitespace) is [this](http://stackoverflow.com/a/1140966/2373278) . Question headline could potentially be changed to 'Remove empty lines with whitespace only in python'. – ViFI Nov 03 '16 at 06:41

13 Answers13

71

Try list comprehension and string.strip():

>>> mystr = "L1\nL2\n\nL3\nL4\n  \n\nL5"
>>> mystr.split('\n')
['L1', 'L2', '', 'L3', 'L4', '  ', '', 'L5']
>>> [line for line in mystr.split('\n') if line.strip()]
['L1', 'L2', 'L3', 'L4', 'L5']
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
gimel
  • 83,368
  • 10
  • 76
  • 104
52

Using regex:

if re.match(r'^\s*$', line):
    # line is empty (has only the following: \t\n\r and whitespace)

Using regex + filter():

filtered = filter(lambda x: not re.match(r'^\s*$', x), original)

As seen on codepad.

NullUserException
  • 83,810
  • 28
  • 209
  • 234
  • 2
    Thanks for all the results, however, this solution was exactly what I had been looking for! Thanks a lot –  Sep 14 '10 at 19:01
  • 5
    gimel's solution, with re-joining the text afterwards, gives a far better performance. I compared the two solutions on a small text (10 lines if which 3 were blank). Here are the results: regex: `1000 loops, best of 3: 452 us per loop`; join, split & strip: `100000 loops, best of 3: 5.41 us per loop` – m01 May 28 '13 at 08:48
  • Introductory video: [Python Tutorial: re Module - How to Write and Match Regular Expressions (Regex) - YouTube](https://www.youtube.com/watch?v=K8L6KVGG-7o) – Ooker Aug 23 '19 at 13:28
26

I also tried regexp and list solutions, and list one is faster.

Here is my solution (by previous answers):

text = "\n".join([ll.rstrip() for ll in original_text.splitlines() if ll.strip()])
Regisz
  • 481
  • 9
  • 17
13
lines = bigstring.split('\n')
lines = [line for line in lines if line.strip()]
nmichaels
  • 49,466
  • 12
  • 107
  • 135
8

Surprised a multiline re.sub has not been suggested (Oh, because you've already split your string... But why?):

>>> import re
>>> a = "Foo\n \nBar\nBaz\n\n   Garply\n  \n"
>>> print a
Foo

Bar
Baz

        Garply


>>> print(re.sub(r'\n\s*\n','\n',a,re.MULTILINE))
Foo
Bar
Baz
        Garply

>>> 
mushuweasel
  • 674
  • 7
  • 7
  • 1
    On a multiline sub, \s* will match any number of \n and any other whitespace: > >>> import re > >>> a = "foo\n \n\t\n \nbar\n\n \n baz" > >>> print(re.sub(r'\n\s*\n','\n',a,re.MULTILINE)) > foo > bar > baz grumble. I apparently can't figure out markdown in comments. – mushuweasel Aug 13 '19 at 21:17
3

If you are not willing to try regex (which you should), you can use this:

s.replace('\n\n','\n')

Repeat this several times to make sure there is no blank line left. Or chaining the commands:

s.replace('\n\n','\n').replace('\n\n','\n')


Just to encourage you to use regex, here are two introductory videos that I find intuitive:
Regular Expressions (Regex) Tutorial
Python Tutorial: re Module

Ooker
  • 1,969
  • 4
  • 28
  • 58
  • 2
    You may want to use a regular expression, for example. "Repeat several lines to be sure" is not a good idea when you are coding, as you may leave things unsolved or waste time running something more times than needed. – Enrico Jun 21 '16 at 04:55
  • +1 to regex, but as a lazy hack (or if importing the regex module is too slow) you can chain replace statements: `s.replace('\n\n','\n').replace('\n\n','\n')` Tested on 3.6. – evan_b Jun 16 '17 at 05:28
  • @evan_b didn't think of chaining commands. Which one will be executed first? – Ooker Jun 16 '17 at 15:03
  • 1
    Execution order appears to be left to right, but I wasn't able to find that documented anywhere after searching briefly, so it may not be safe to rely on that for order-sensitive replacements. – evan_b Jun 17 '17 at 22:53
2

you can simply use rstrip:

    for stuff in largestring:
        print(stuff.rstrip("\n")
Rahul Pandey
  • 173
  • 3
  • 4
1

I use this solution to delete empty lines and join everything together as one line:

match_p = re.sub(r'\s{2}', '', my_txt) # my_txt is text above
Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
red fred
  • 11
  • 1
0

My version:

while '' in all_lines:
    all_lines.pop(all_lines.index(''))
Radren
  • 307
  • 4
  • 6
0

Use positive lookbehind regex:

re.sub(r'(?<=\n)\s+', '', s, re.MULTILINE)

When you input:

foo
<tab> <tab>

bar

The output will be:

foo
bar
Kxrr
  • 506
  • 6
  • 14
0
str_whith_space = """
    example line 1

    example line 2
    example line 3

    example line 4"""

new_str = '\n'.join(el.strip() for el in str_whith_space.split('\n') if el.strip())
print(new_str)

Output:

""" <br>
example line 1 <br>
example line 2 <br>
example line 3 <br>
example line 4 <br>
"""
Patrick
  • 1,189
  • 5
  • 11
  • 19
0

You can combine map and strip to remove spaces and use filter(None, iterable) to remove empty elements:

string = "a\n \n\nb"
list_of_str = string.split("\n")
list_of_str = filter(None, map(str.strip, list_of_str))
list(list_of_str)

Returns: ['a', 'b']

mathause
  • 1,607
  • 1
  • 16
  • 24
-1

Same as what @NullUserException said, this is how I write it:

removedWhitespce = re.sub(r'^\s*$', '', line)
Reihan_amn
  • 2,645
  • 2
  • 21
  • 21