0

Possible Duplicate:
How to check if text is “empty” (spaces, tabs, newlines) in Python?

I am trying to write a short function to process lines of text in a file. When it encounters a line with significant content (meaning more than just whitespace), it is to do something with that line. The control structure I wanted was

if '\S' in line: do something

or

if r'\S' in line: do something

(I tried the same combinations with double quotes also, and yes I had imported re.) The if statement above, in all the forms I tried, always returns False. In the end, I had to resort to the test

if re.search('\S', line) is not None: do something

This works, but it feels a little clumsy in relation to a simple if statement. My question, then, is why isn't the if statement working, and is there a way to do something as (seemingly) elegant and simple?

I have another question unrelated to control structures, but since my suspicion is that it is also related to a possibly illegal use of regular expressions, I'll ask it here. If I have a string

s = " \t\tsome text \t \n\n"

The code

s.strip('\s')

returns the same string complete with spaces, tabs, and newlines (r'\s' is no different). The code

s.strip()

returns "some text". This, even though strip called with no character string supposedly defaults to stripping whitespace characters, which to my mind is exactly what the expression '\s' is doing. Why is the one stripping whitespace and the other not?

Thanks for any clarification.

Community
  • 1
  • 1
wayeast
  • 193
  • 4
  • 12
  • 2
    this is python, not perl. [`Explicit is better than implicit. Readability counts. ...`](http://www.python.org/dev/peps/pep-0020/) – mata May 22 '12 at 21:20
  • @mata: I can appreciate the value of this aphorism. I just wouldn't have qualified regular expressions as "implicit." To me, they're just a convenient way of covering a lot of bases. – wayeast May 22 '12 at 21:28
  • 1
    What @mata is saying is that Python does not know whether you are using a regular expression or not so it always uses the obvious, a normal string. – Tyler Crompton May 22 '12 at 21:30
  • yes, but in python if you want regex, you explicitly must say so. besides, if you look at perl, the whole language overuses regex way to much for my taste, and is one of the hardest languages to read that I know. I like python for not following perl in this sense. – mata May 22 '12 at 21:32

4 Answers4

2

Python string functions are not aware of regular expressions, so if you want to use them you have to use the re module.

However if you are only interested in finding out of a string is entirely whitespace or not, you can use the str.isspace() function:

>>> 'hello'.isspace()
False
>>> '  \n\t  '.isspace()
True
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • .: Thanks for the info. I didn't know that string functions weren't aware of regular expressions (this was my suspicion after this experience, but I guess I missed that explanation in the documentation). Also, thanks for clueing me in to isspace() – wayeast May 22 '12 at 21:31
1

This is what you're looking for

if not line.isspace(): do something

Also, str.strip does not use regular expressions.

Tyler Crompton
  • 12,284
  • 14
  • 65
  • 94
1

If you are really just want to find out if the line only consists of whitespace characters regex is a little overkill. You should got for the following instead:

if text.strip():
    #do stuff

which is basically the same as:

if not text.strip() == "":
    #do stuff

Python evaluates every non-empty string to True. So if text consists only of whitespace-characters, text.strip() equals "" and therefore evaluates to False.

devsnd
  • 7,382
  • 3
  • 42
  • 50
0

The expression '\S' in line does the same thing as any other string in line test; it tests whether the string on the left occurs inside the string on the right. It does not implicitly compile a regular expression and search for a match. This is a good thing. What if you were writing a program that manipulated regular expressions input by the user and you actually wanted to test whether some sub-expression like \S was in the input expression?

Likewise, read the documentation of str.strip. Does it say that will treat it's input as a regular expression and remove matching strings? No. If you want to do something with regular expressions, you have to actually tell Python that, not expect it to somehow guess that you meant a regular expression this time while other times it just meant a plain string. While you might think of searching for a regular expression as very similar to searching for a string, they are completely different operations as far as the language implementation is concerned. And most str methods wouldn't even make sense when applied to a regular expression.

Because re.match objects are "truthy" in boolean context (like most class instances), you can at least shorten your if statement by dropping the is not None test. The rest of the line is necessary to actually tell Python what you want. As for your str.strip case (or other cases where you want to do something similar to a string operation but with a regular expression), have a look at the functions in the re module; there are a number of convenience functions on there that can be helpful. Or else it should be pretty easy to implement a re_split function yourself.

Ben
  • 68,572
  • 20
  • 126
  • 174