2

I'm studying regular expressions from Mastering Regular Expressions, 3rd Edition, and I've come across the statement that $ is a bit more complex than ^, which surprised me as I thought they were "symmetrical", except when they are escaped to mean their literal counterparts.

In fact, at page 129, their description is slightly different, with more words spent in favour of $; however I'm still confused about it.

  • Regarding ^, only two clear alternatives are described:

Caret ^ matches at the beginning of the text being searched, and, if in an enhaced line-anchor mode, after any newline. [...] $ [...] matches

  • As regards $, the description is more obscure to me:

$ [...] matches at the end of the target string, and before a string-ending newline, as well. The latter is common, to allow an expression like s$ (ostensibly, to match "a line ending with s") to match …s<NL>, a line ending with s that's capped with an ending newline.

Two other common meanings for $ are to match only at the end of the target text, and to match before any newline.

The latter two meanings seem pretty symmetric to those described for ^, but what about the string-ending newline meaning?

Searching for [regex] "string-ending newline" only gives one, two, and three results, at the moment, and all of them refer to

$ Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.

Enlico
  • 23,259
  • 6
  • 48
  • 102
  • The book is about regular expressions in general, but it mainly focuses on Perl, hence the tag in my question. Given the table on the following page, I think the text applies to Perl (as well as to other languages, but not `awk`). – Enlico Mar 07 '20 at 15:42
  • 2
    OK, if you're using perl, you could use `\z` to match just the end of the string, and `\Z` to match the end or before the newline at the end. `$` in perl is subject to the `m` flag of `m//` or `qr//`. –  Mar 07 '20 at 15:44

4 Answers4

4

Zero width assertion $ asserts position at the end of the string, or before the line terminator right at the end of the string (if any).

It will be more clear with these code snippets in perl:

$str = 'abc
foo';
$str =~ s/\w+$/#/;
print "1. <" . $str . ">\n\n";

$str = 'abc
foo
';
$str =~ s/\w+$/#/;
print "2. <" . $str . ">\n\n";

$str = 'abc
foo

';
$str =~ s/\w+$/#/;
print "3. <" . $str . ">\n\n";

This will generate this output:

1. <abc
#>

2. <abc
#
>

3. <abc
foo

>

As you can see that $ matches cases 1 and 2 because $ matches at the end of string (case 1) or before the line break right at the end (case 2). However case 3 remains unmatched because line break is not at the end of string.

Enlico
  • 23,259
  • 6
  • 48
  • 102
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • So, as a consequence, wherever `$` matches, there can be either nothing (case 1) or only a single newline after it (case 2). Is this correct? (Obviously this comment, as your answer, is assuming the `m`ultiline flag is not set.) – Enlico Mar 07 '20 at 15:57
  • 2
    Yes that's right `MULTILINE` or `m` is not set in all 3 cases. It is correct that `$` will match either nothing at the end or a single line break at the end. – anubhava Mar 07 '20 at 16:22
4

"String-ending newline" refers to a line feed that is the last character of a string.


Without /m

$ matches before a line feed at the end of the string, and at the end of the string.

"abc\ndef\n" =~ /^abc$/           # Doesn't match at embedded line feed
"abc\ndef\n" =~ /^abc\n$/         # Doesn't match after embedded line feed
"abc\ndef\n" =~ /^abc\ndef$/      # Matches at string-ending line feed
"abc\ndef\n" =~ /^abc\ndef\n$/    # Matches at end of string

It's equivalent to \Z, which is equivalent to (?=\n\z|\z).

With /m

$ matches before a line feed, and at the end of the string.

"abc\ndef\n" =~ /^abc$/           # Matches at embedded line feed
"abc\ndef\n" =~ /^abc\n$/         # Doesn't match after embedded line feed
"abc\ndef\n" =~ /^abc\ndef$/      # Matches at string-ending line feed
"abc\ndef\n" =~ /^abc\ndef\n$/    # Matches at end of string

It's equivalent to (?=\n|\z).


\z is used when you want an exact match.

/xyz\z/    # String ends with "xyz"

$ is used when you want to ignore a trailing line feed.

/xyz$/     # Line ends with "xyz". The string might end with a line feed.

For example,

"jkl"   =~ /^jkl$/     # Matches at end of string
"jkl"   =~ /^jkl\z/    # Matches at end of string

"jkl\n" =~ /^jkl$/     # Matches at string-ending line feed
"jkl\n" =~ /^jkl\z/    # Doesn't match at string-ending line feed

$ is useful if matching against lines you haven't chomped yet.

while (<>) {
   next if /^foo$/;
   ...
}

\z is useful the rest of the time.


Note that other regex engines may behave differently, even those that are Perl-like. For example, in JavaScript, $ without /m only matches at the end of the string.

ikegami
  • 367,544
  • 15
  • 269
  • 518
0

the point is that $ will match both before (a) newline character(s) and at the end of a file or an input string, which can or can not end with (a) newline character(s)

0

Please see if the following code any help in clarifying meaning of $ in regex, \n added for comparison

use strict;
use warnings;
use feature 'say';

my $str = 'abc
foo
bar
';

my $str_test;

$str_test = $str;
$str_test =~ s/(.+)$/[$1]/;
say '-' x 30;
say ' regex :: s/(.+)$/[$1]/';
say '-' x 30;
say $str_test;

$str_test = $str;
$str_test =~ s/(.+)$/[$1]/s;
say '-' x 30;
say ' regex :: s/(.+)$/[$1]/s';
say '-' x 30;
say $str_test;

$str_test = $str;
$str_test =~ s/\n/[NL]\n/s;
say '-' x 30;
say ' regex :: s/\n/[NL]\n/s';
say '-' x 30;
say $str_test;

$str_test = $str;
$str_test =~ s/\n/[NL]\n/g;
say '-' x 30;
say ' regex :: s/\n/[NL]\n/g';
say '-' x 30;
say $str_test;

Output

------------------------------
 regex :: s/(.+)$/[$1]/
------------------------------
abc
foo
[bar]

------------------------------
 regex :: s/(.+)$/[$1]/s
------------------------------
[abc
foo
bar
]
------------------------------
 regex :: s/\n/[NL]\n/s
------------------------------
abc[NL]
foo
bar

------------------------------
 regex :: s/\n/[NL]\n/g
------------------------------
abc[NL]
foo[NL]
bar[NL]
Polar Bear
  • 6,762
  • 1
  • 5
  • 12