Does \W include the carriage return (\r) or line feed (\n) characters?

Question

IOWs, the negated form of the \w character class. And should I expect different behavior from the different languages I'm using the regex in?

What do you mean IOWs? And I think the negated form of \w is \W.. — , Mar 02 '13 at 21:36

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

1

Of course does \W include \r and \n.

\W is the negation of \w and \w contains letters, digits and connecting punctuation characters (like the underscore).

There are now 3 possibilities:

\w is ASCII based ==> [a-zA-Z0-9_]
\w is Unicode based ==> something like [\p{L}\p{Nd}\p{Pc}] means letters, digits from all languages and some more characters similar to the underscore See Unicode on regular-expressions.info
The flavour allows you to switch the behaviour of \w with a modifier.

But since newline characters are never included in \w they are in all cases included in \W

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 02 '13 at 22:32

stema

90,351
20
107
135

then 'split /[\W+\n+\r+]/, $multi_line_string;' should be equal to 'split /\W+/, $multi_line_string;'? As I'm getting a different number of results from each of these. – Jim Black Mar 02 '13 at 22:40
This comes probably from `[\W+\n+\r+]`, you need to put the quantifier behind the character class, this way you add the + to the class. Compare `[\W\n\r]+` and `\W+` – stema Mar 02 '13 at 22:43

score 0 · Answer 2 · edited May 23 '17 at 11:43

\w is a short-hand for [a-zA-Z0-9_] so it will match only a-z (lower and upper), digits and underscore. The negated \w is \W will match everything besides \w

Read here more.

Basically there are 2 types of regex, POSIX and Perl. Theoretically posix regex should act same independent of programming language, but there are some known exceptions. See this thread for differences between Java and .NET (theoretically same posix, practically not same) Are Java and C# regular expressions compatible?

Does \W include the carriage return (\r) or line feed (\n) characters?

2 Answers2