0

In port of a perl application to dart, I have to deal with regular expressions of the form below. The result of of the execution of both perl version and Dart version is included. The idea is simple replace basic patterns at the end of string. For me, the result that I get from perl fragment is correct. However the results from Dart version does not seem right. I would appreciate your help to understand where am I going wrong. Thanks in advance.

my $str ="this is a line of text ‖ ###";
print("\nIn 1 str=|$str|");
$str =~ s/###$/\n/g;
print("\nIn 2 str=|$str|");
$str =~ s/ ‖ $//g;
print("\nIn 3 str=|$str|");

output:

In 1 str=|this is a line of text ‖ ###|
In 2 str=|this is a line of text ‖ 
|
In 3 str=|this is a line of text
|

Dart code:

void main() {
var str;
str ="this is a line of text ‖ ###";
print("\nIn 1 str=|$str|");
str = str.replaceAll(RegExp(r'###$'), "\n");
print("\nIn 2 str=|$str|");
str = str.replaceAll(RegExp(r' ‖ $'), "");
print("\nIn 3 str=|$str|");
print("\n\n");
}

output:

In 1 str=|this is a line of text ‖ ###|

In 2 str=|this is a line of text ‖ 
|

In 3 str=|this is a line of text ‖ 
|

As you see:

 str = str.replaceAll(RegExp(r' ‖ $'), "");

does not replace the pattern ' ‖ $' with "" as opposed to its perl equivalent.

Bid
  • 39
  • 5
  • 1
    I'm unfamiliar with dart or its particular flavor of regular expressions, but possibly `$` only matches at the end of the string in it, not before a newline at the end like perl's (That is, does it act like perl's `\z`?)? – Shawn Jul 15 '22 at 21:47
  • 1
    You might try `str = str.replaceAll(RegExp(r' ‖\s+$'), "");` (I'm assuming it supports `\s` for whitespace characters.) – Shawn Jul 15 '22 at 21:50
  • the 'str = str.replaceAll(RegExp(r' ‖\s+$'), "");' works, but why should it be different. Thanks for your hint. – Bid Jul 15 '22 at 22:09
  • 1
    My educated guess was right? Nice. I'll throw up an answer with a bit more detail. – Shawn Jul 15 '22 at 22:11
  • 1
    The equivalent to JavaScript's `$` is Perl's `\z` (end of string). Perl's `$` matches before the end of string *and* before a newline at the end, if there's one. You could emulate this in JavaScript as `(?=\n?$)`, I think. – amon Jul 15 '22 at 22:16
  • the 'str = str.replaceAll(RegExp(r' ‖\s+$'), "");' and it also deletes the end of line \n that was added earlier! That should have preserved like in perl! This is at least a partial solution. Thanks for your hint. – Bid Jul 15 '22 at 22:16
  • What is the `‖` character? If it's not one of the first 128 Unicode code points, that could definitely be an issue in Perl, and possibly also in Dart, whatever that is. – mob Jul 16 '22 at 05:05
  • ‖ I believe it is Unicode, I haven't verfied it but no isse in Perl – Bid Jul 16 '22 at 16:32

2 Answers2

2

In perl-dialect regular expressions, $ matches either at the end of the string or before a newline if it's the last character of the string (The rules are a bit different for multi-line mode, but you're not using that so we'll pretend it doesn't exist. \Z always has that same behavior, even in multi-line matches, so some people prefer using it instead of $ for consistency.)

So the RE /g$/ will match like

some great string\n
                ^

that is, at the g at the end before that last newline. There's also \z, which always matches at the actual end of the string. /g\z/ won't match in the above example because of the newline.

Dart-dialect regular expressions seem to have $ act like \z - so your second replacement wasn't matching because of the newline you added earlier. So if you use

    str = str.replaceAll(RegExp(r' ‖\s+$'), "\n");

it will match as intended, and replace all that text with a trailing newline to match the perl version. Or strip off the trailing stuff and then append a newline instead of going the other way around.

Shawn
  • 47,241
  • 3
  • 26
  • 60
  • Also don't really need `replaceAll()` to do a single replacement, but I don't know what the replace-one-match method is. `replace()` would be logical, but I don't know dart. – Shawn Jul 15 '22 at 22:19
  • There is no "replace" only method in Dart at least I couldn't get it to work. But I totally agree with your point. Trimming the input text and then adding end of line at the end would have worked. But the problem is some input text do not have end of line and they should leave the process without it. Dealing with these exceptional cases just makes the code 10 times larger.. Thanks again for your input. – Bid Jul 15 '22 at 22:29
  • You should just stick with perl. :) – Shawn Jul 15 '22 at 22:43
  • Well, I see your point, in fact the perl version is amazingly much much faster! But the final aim is to have an android and Apple application. I am not sure if with perl one could do that. Then again my knowledge of both languages are not that high – Bid Jul 16 '22 at 16:36
2

$ are not equivalent in both regex languages.

Dart uses the same regex language as JavaScript, and Reference - What does this regex mean? says the following:

  • In Perl regex, $ matches at a LF at the end of the string, and it matches at the very end of the string.

  • In JavaScript and Dart, $ matches at the very end of the string.

The rows of the following table identify equivalencies:

Perl
m=*
Perl
m=0
Perl
m=1
JS
m=*
JS
m=0
JS
m=1
Very end of string \z (?![\s\S]) $
End of text \Z $ (?=\n?(?![\s\S])) (?=\n?$)
End of line (?=\n)|\z $ (?=\n)|(?![\s\S]) (?=\n)|$ $

(Multiline mode changes the meaning of $. "m=*", "m=0" and "m=1" respectively mean "whether in multiline mode or not", "outside of multiline mode" and "in multiline mode".)

So, to get Perl's behaviour in Dart, you can use (?=\n?$) (in general) or \s*$ (in this case) instead of $

JavaScript is great, but it really dropped the ball here.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thank you so much for your explanation and the comparison table. That is really life saver. I am confronted with some 7K lines of perl code that 90% of it are regexp. The references should certainly help. Once again many thanks. – Bid Jul 16 '22 at 05:33