How to test to see if a string is only whitespace in perl

Question

Whats a good way to test to see if a string is only full of whitespace characters with regex?

See also: http://stackoverflow.com/questions/1968702/how-can-i-detect-a-blank-line-in-perl — Alex, May 24 '11 at 05:05

score 16 · Accepted Answer · answered May 24 '11 at 04:55

16

if($string=~/^\s*$/){
    #is 100% whitespace (remember 100% of the empty string is also whitespace)
    #use /^\s+$/ if you want to exclude the empty string
}

answered May 24 '11 at 04:55

tobyodavies

27,347
5
42
57

Note that Perl has default flags for the match operator, so someone may have set the `/m` option somewhere else in the scope. That means that `^` and `$` match at the beginning or end of lines, which allows for text before and after the anchors. To avoid that, use the beginning and end of string anchors: `/\A\s*\z/`. – brian d foy Jun 20 '16 at 20:44

Axeman · Answer 2 · 2011-05-25T12:50:06.463

9

(I have decided to edit my post to include concepts in the below conversation with tobyodavies.)

In most instances, you want to determine whether or not something is whitespace, because whitespace is relatively insignificant and you want to skip over a string consisting of merely whitespace. So, I think what you want to determine is whether or not there are significant characters.

So I tend to use the reverse test: $str =~ /\S/. Determining the predicate "string contains one Significant character".

However, to apply your particular question, this can be determined in the negative by testing: $str !~ /\S/

edited May 25 '11 at 12:50

answered May 24 '11 at 04:59

Axeman

29,660
2
47
102

3

Depends entirely on whether you want "" to match or not. – ysth May 24 '11 at 05:05
1

not a fan of double negatives - does *not* contain *non* whitespace... unsure about this, but definitely an interesting way of solving the problem! – tobyodavies May 24 '11 at 05:07
2

@tobyodavies, it depends on the reason that you're filtering space only strings. In my experience, most times you want to exclude or ignore them. So, this would actually be a positivetest. Also, I don't think of `\S` as a negative--but a *complement*. – Axeman May 24 '11 at 12:12
@Axeman if you read it as a complement does that mean your English reading of this regex is "contains at least one member of the complement of the set of whitespace characters"? I'm not sure that reads better... Also, if you wanted to ignore whitespace strings then the operator would be different - `if($str=~/\S/){ ... }` so _this answer_ is a double negative – tobyodavies May 25 '11 at 08:22
@tobyodavies: "contains non-whitespace". "negotiate" is a tight enoguh idiom that we never think "to not otiate" (not lounging around). So "non-negotiable" is not seen as the double negative it etymologically is. Actually, "Whitespace" is the inversion (a "character" that does not print) but because we have so much whitespace-delimited parsing, it has become a thing in it's own right--while in printing it is actually "what does not use ink". So although I can say "non-whitespace" but I'm thinking "an actual character", or "differs from the background", "requires ink", "something I can use"... – Axeman May 25 '11 at 12:24
Just the same way that negotiate means "really get down to business and hash it out" (stop playing around). I am actually suspecting that he wants to determine the predicate "is only whitespace" so he knows that it is *not* significant, and ignore it. Thus this easily turns into `next unless $line =~ /\S/;` or `grep { /\S/ } @lines`. – Axeman May 25 '11 at 12:33
@Axeman exactly, you have proved my point - you used `=~` not `!~` as the operator in your example there. that was the second negative, not the whitespace being the opposite of a printed character (which I dispute - whitespace is a sufficiently distinct concept from characters as you can have white space on anything... and etymologically whitespace has no negations in its name) – tobyodavies May 26 '11 at 04:36

tofutim · Answer 3 · 2011-05-24T05:00:56.260

1

Your regex statement should look for ^\s+$. It will require at least one whitespace.

In case you were wondering, "white space is defined as [\t\n\f\r\p{Z}]". See http://userguide.icu-project.org/strings/regexp.

\t  Match a HORIZONTAL TABULATION, \u0009.
\n  Match a LINE FEED, \u000A.
\f  Match a FORM FEED, \u000C.
\r  Match a CARRIAGE RETURN, \u000D.
\p{UNICODE PROPERTY NAME}   Match any character with the specified Unicode Property.

edited May 24 '11 at 05:00

answered May 24 '11 at 04:55

tofutim

22,664
20
87
148

Note that Perl has default flags for the match operator, so someone may have set the `/m` option somewhere else in the scope. That means that `^` and `$` match at the beginning or end of lines, which allows for text before and after the anchors. To avoid that, use the beginning and end of string anchors: `/\A\s*\z/`. – brian d foy Jun 20 '16 at 20:45
Also, see [perlrecharclass](http://perldoc.perl.org/perlrecharclass.html) for Perl's definition of whitespace. – brian d foy Jun 20 '16 at 20:50

How to test to see if a string is only whitespace in perl

3 Answers3