1

Let us look at Perl code and result:

$s = "a\nb\nc\n";
$s =~ s/^b/X/;
print $s;

a
b
c

$s = "a\nb\nc\n";
$s =~ s/^b/X/m;
print $s;

a
X
c

I think Perl is right, ^ matches the position after new line in the middle only when multiline is enabled.

Let us look at Ruby:

$s = "a\nb\nc\n"
print $s.sub(/^b/,'X')

a
X
c

$s = "a\nb\nc\n"
print $s.sub(/^b/m,'X')

a
X
c

The ^ matches the position after newline in the middle of text regardless if it is in multiline mode or not.

For the life of me, I cannot find Ruby documentation which defines what the multiline option will do, where is it?

Also Ruby has no Single line mode (s)?

undefined group option: /(?s)^b/

/^b./s will parse but it does not behave like Perl (. matches new line).

PS: I tested using Perl 5 and Ruby 3.0.

Deepak Rai
  • 2,163
  • 3
  • 21
  • 36
puravidaso
  • 1,013
  • 1
  • 5
  • 22
  • Unfortunately, large parts of Ruby have no independent written formal specification. The specification for large parts of Ruby is "whatever YARV does". Since I assume you are using YARV here (to my knowledge, no other Ruby implementation implements Ruby 3.0 at the moment), whatever result you are seeing is *by definition* correct. – Jörg W Mittag Jan 31 '21 at 07:50
  • In Ruby (more generally, in Oniguruma) multiline mode (`/m`) is what is called *DOTALL* mode in many other languages: it causes the dot `.` to match line terminators as well as all other characters. In many other languages `/m` causes the anchors `^` and `$` to match the start and ends of a line, rather than the start and end of the string. Ruby doesn't need that because `^` and `$` are line anchors and `\A` and `\z` are string anchors. – Cary Swoveland Jan 31 '21 at 08:07
  • 4
    Both are right, because they're different languages with different regex implementations with different ways of doing things. – Shawn Jan 31 '21 at 08:18

2 Answers2

8

Ruby and Perl's /m work differently.


Ruby's /m changes the behavior of only .. It is equivalent to Perl's /s.

  • Ruby /m: Treat a newline as a character matched by .

  • Perl /s: Treat the string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.

Perl's /m changes the behavior of ^ and $.

  • Perl /m: Treat the string being matched against as multiple lines. That is, change "^" and "$" from matching the start of the string's first line and the end of its last line to matching the start and end of each line within the string.

^ and $ always work this way in Ruby. Ruby effectively always has Perl's /m.

Ruby and Perl both share \A, \z, and \Z to match at the beginning of the string, end of the string, or just before the final newline.

Which is correct? Neither, they do their own thing. Perl's default behavior for ^ and $ is the same as POSIX regular expressions, but they are incompatible in other ways. Python uses the equivalent of Perl's multi and single-line modes (MULTILINE and DOTALL). Ruby simplifies the behavior of ^ and $ and makes regexes more explicit.

See Also

ikegami
  • 367,544
  • 15
  • 269
  • 518
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Thanks for the explanation. The subtle difference does cause confusion, why cannot we agree on each other? – puravidaso Jan 31 '21 at 14:25
  • @puravidaso These languages are 20 to 30 years old, they were making it up as they went along. "Standard" [POSIX Regexes](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html) available at the time are pretty weak sauce by today's standards. Folks *mostly* copied Perl, and this was eventually "standardized" as [PCRE](https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions), but any differences are frozen for backwards compatibility. Programmers were more monolingual then and standardization between languages was not a big concern, just looting the ideas. – Schwern Jan 31 '21 at 21:50
1

I think Perl is right, ^ matches the position after new line in the middle only when multiline is enabled.

Yes, that is correct. According to man perlre, section Metacharacters, the ^ anchor means:

Match the beginning of the string (or line, if /m is used)

Let us look at Ruby: […] The ^ matches the position after newline in the middle of text regardless if it is in multiline mode or not.

Also correct. According to the documentation of the Regexp class, section Anchors:

^ - Matches beginning of line

For the life of me, I cannot find Ruby documentation which defines what the multiline option will do, where is it?

It is in the documentation of the Regexp class, section Options:

/pat/m - Treat a newline as a character matched by .

Also Ruby has no Single line mode (s)?

In Ruby, you can deactivate a mode by prepending it with a dash -. So, if you are currently in multiline mode, and want to go back to singleline, you don't need a separate mode for that. You can just turn off multiline mode again using -m.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • It sounds from what you said that Ruby's `/m` causes `.` to match LF? If so Ruby's `/m` is equivalent to Perl's `/s`. (Not the absence of `/m` as you say) – ikegami Jan 31 '21 at 08:52