1

I am using the following regular expression:

$test="a\n";

if ($test =~ /^.$/ ){
 print "Test Passed\n";
}else{
 print "Test Failed\n";
}   

For the aforementioned test variable, the regular expression finds the pattern.

However, if I change the variable to the following value it fails to identify the pattern.

$test="\na";

I know that my expression matches a single character i.e. the target should start and end with a single character.

John Rambo
  • 906
  • 1
  • 17
  • 37

2 Answers2

3

Concise answer

If you need to check if a string has only one character (any, incl. newline), use

/^.\z/s

Explanation

The problem stems from the fact that you are using the $ without D modifier, meaning that $ matches at the end, but not at the very end of the string. Here, $ = \Z.

By default, $ will match the end of string and the position before the last newline. Thus, a\n passes the if ($test =~ /^.$/ ) test, but \na will not, since . cannot match a newline, and it is not at the end, but at the start (it won't be matched with if ($test =~ /^.$/ ) nor with if ($test =~ /^.$/s )).

Note that you can use \z anchor that will force the regex engine to match at the very end of the string. Then, both test cases even with a DOTALL modifier will fail. Use /^.\z/ if you need that behavior. Or /^.\z/s to also match a single newline.

Also, see Whats the difference between \z and \Z in a regular expression and when and how do I use it?

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I didn't get why the first test case is passing, as there are two characters present in the input i.e. a **new line**, and a character **a**. – John Rambo Feb 04 '16 at 21:55
  • I have explained it. The `$` is `\Z`, matching right at the end, or before the last newline in the string. Your string `a\n` has 1 non-whitespace character and it is matched with `.`. Then, the `$` matches the position in front of the last newline. Bingo! – Wiktor Stribiżew Feb 04 '16 at 21:59
1

You have two problems.

First, $ does not match the end of the string. In the absence of the /m flag, it is equivalent to \Z, which matches either at the end of the string or just before a newline character at the end of the string.

Almost always this is not what you intend and you should use \z which only matches at the end of the string. Pretty much any code that uses

Secondly, . by default does not match any character. Unless you supply the /s flag, it matches any character but \n.

So your regex /^.$/ will match:

  1. a single character that is not a newline, or
  2. two characters, where the first is not a newline but the second is

To match a single character, use /^.\z/s (or just length($string) == 1).

ysth
  • 96,171
  • 6
  • 121
  • 214