-2

Lets say I have this data

data = '''a, b, c
d, e, f
g. h, i
  
j, k , l


'''

4th line contains one single space, 6th and 7th line does not contain any space, just a blank new line.

Now when I split the same using splitlines

data.splitlines()

I get

['a, b, c', 'd, e, f', 'g. h, i', ' ', 'j, k , l', '', '']

However expected was just

['a, b, c', 'd, e, f', 'g. h, i', 'j, k , l']

Is there a simple solution using regular expressions to do this.

Please note that I know the other way of doing the same by filtering empty strings from the output of splitlines()

I am not sure if the same can be achieved using regex.

When I use regex to split on new line, it gives me

import re
re.split("\n", data)

Output :

['a, b, c', 'd,e,f', 'g. h, i', ' ', 'j, k , l', '', '', '']
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
  • Does this answer your question? [Remove empty strings from a list of strings](https://stackoverflow.com/questions/3845423/remove-empty-strings-from-a-list-of-strings) – Pranav Hosangadi Jun 17 '22 at 05:36
  • @PranavHosangadi as I said, I don't want to filter later, can you suggest me a regex kind of solution? –  Jun 17 '22 at 05:38
  • Why are you so insistent on using *re* when you've been offered a perfectly good answer that requires no additional imports? – DarkKnight Jun 17 '22 at 05:44

2 Answers2

1

List comprehension approach

You can add elements to your list if they are not empty strings or whitespace ones with a condition check.

If the element/line is True after stripping it from whitespaces, then it is different from an empty string, thus you add it to your list.

filtered_data = [el for el in data.splitlines() if el.strip()]
# ['a, b, c', 'd, e, f', 'g. h, i', 'j, k , l']

Regexp approach

import re
p = re.compile(r"^([^\s]+.+)", re.M)
p.findall(data)
# ['a, b, c', 'd, e, f', 'g. h, i', 'j, k , l']
crissal
  • 2,547
  • 7
  • 25
1

I disagree with your assessment that filtering is more complicated than using regular expressions. However, if you really want to use regex, you could split at multiple consecutive newlines like so:

>>> re.split(r"\n+", data)
['a, b, c', 'd, e, f', 'g. h, i', 'j, k , l', '']

Unfortunately, this leaves an empty string at the end of your list. To get around this, use re.findall to find everything that isn't a newline:

>>> re.findall(r"([^\n]+)", data)
['a, b, c', 'd, e, f', 'g. h, i', 'j, k , l']

Since that regex doesn't work on input with spaces, here's one that does:

>>> re.findall(r"^([ \t]*\S.*)$", data, re.MULTILINE)
['a, b, c', 'd, e, f', 'g. h, i', 'j, k , l ']

Here's the explanation:

^([ \t]*\S.*)$
^            $   : Start of line and end of line
 (          )    : Capturing group
  [ \t]*         : Zero or more of blank space or tab (i.e. whitespace that isn't newline
        \S       : One non-whitespace character
          .*     : Zero or more of any character
            
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70