13

Whats a good way to test to see if a string is only full of whitespace characters with regex?

Steffan Harris
  • 9,106
  • 30
  • 75
  • 101
  • See also: http://stackoverflow.com/questions/1968702/how-can-i-detect-a-blank-line-in-perl – Alex May 24 '11 at 05:05

3 Answers3

16
if($string=~/^\s*$/){
    #is 100% whitespace (remember 100% of the empty string is also whitespace)
    #use /^\s+$/ if you want to exclude the empty string
}
tobyodavies
  • 27,347
  • 5
  • 42
  • 57
  • Note that Perl has default flags for the match operator, so someone may have set the `/m` option somewhere else in the scope. That means that `^` and `$` match at the beginning or end of lines, which allows for text before and after the anchors. To avoid that, use the beginning and end of string anchors: `/\A\s*\z/`. – brian d foy Jun 20 '16 at 20:44
9

(I have decided to edit my post to include concepts in the below conversation with tobyodavies.)

In most instances, you want to determine whether or not something is whitespace, because whitespace is relatively insignificant and you want to skip over a string consisting of merely whitespace. So, I think what you want to determine is whether or not there are significant characters.

So I tend to use the reverse test: $str =~ /\S/. Determining the predicate "string contains one Significant character".

However, to apply your particular question, this can be determined in the negative by testing: $str !~ /\S/

Axeman
  • 29,660
  • 2
  • 47
  • 102
  • 3
    Depends entirely on whether you want "" to match or not. – ysth May 24 '11 at 05:05
  • 1
    not a fan of double negatives - does *not* contain *non* whitespace... unsure about this, but definitely an interesting way of solving the problem! – tobyodavies May 24 '11 at 05:07
  • 2
    @tobyodavies, it depends on the reason that you're filtering space only strings. In my experience, most times you want to exclude or ignore them. So, this would actually be a positivetest. Also, I don't think of `\S` as a negative--but a *complement*. – Axeman May 24 '11 at 12:12
  • @Axeman if you read it as a complement does that mean your English reading of this regex is "contains at least one member of the complement of the set of whitespace characters"? I'm not sure that reads better... Also, if you wanted to ignore whitespace strings then the operator would be different - `if($str=~/\S/){ ... }` so _this answer_ is a double negative – tobyodavies May 25 '11 at 08:22
  • @tobyodavies: "contains non-whitespace". "negotiate" is a tight enoguh idiom that we never think "to not otiate" (not lounging around). So "non-negotiable" is not seen as the double negative it etymologically is. Actually, "Whitespace" is the inversion (a "character" that does not print) but because we have so much whitespace-delimited parsing, it has become a thing in it's own right--while in printing it is actually "what does not use ink". So although I can say "non-whitespace" but I'm thinking "an actual character", or "differs from the background", "requires ink", "something I can use"... – Axeman May 25 '11 at 12:24
  • Just the same way that negotiate means "really get down to business and hash it out" (stop playing around). I am actually suspecting that he wants to determine the predicate "is only whitespace" so he knows that it is *not* significant, and ignore it. Thus this easily turns into `next unless $line =~ /\S/;` or `grep { /\S/ } @lines`. – Axeman May 25 '11 at 12:33
  • @Axeman exactly, you have proved my point - you used `=~` not `!~` as the operator in your example there. that was the second negative, not the whitespace being the opposite of a printed character (which I dispute - whitespace is a sufficiently distinct concept from characters as you can have white space on anything... and etymologically whitespace has no negations in its name) – tobyodavies May 26 '11 at 04:36
1

Your regex statement should look for ^\s+$. It will require at least one whitespace.

In case you were wondering, "white space is defined as [\t\n\f\r\p{Z}]". See http://userguide.icu-project.org/strings/regexp.

\t  Match a HORIZONTAL TABULATION, \u0009.
\n  Match a LINE FEED, \u000A.
\f  Match a FORM FEED, \u000C.
\r  Match a CARRIAGE RETURN, \u000D.
\p{UNICODE PROPERTY NAME}   Match any character with the specified Unicode Property.
tofutim
  • 22,664
  • 20
  • 87
  • 148
  • Note that Perl has default flags for the match operator, so someone may have set the `/m` option somewhere else in the scope. That means that `^` and `$` match at the beginning or end of lines, which allows for text before and after the anchors. To avoid that, use the beginning and end of string anchors: `/\A\s*\z/`. – brian d foy Jun 20 '16 at 20:45
  • Also, see [perlrecharclass](http://perldoc.perl.org/perlrecharclass.html) for Perl's definition of whitespace. – brian d foy Jun 20 '16 at 20:50