0

I wrote a parser for a custom (subset of) BBCode in Javascript and now I translated it to C#. This custom BBCode allows parsing line by line so I have regex allowing me to "pop" the first line from the BBCode string:

/(^.*$|^.*\r?\n)/

It matches an empty string. The first part ^.*$ matches a simple string like "Simple string" (single line without CrLf at the end). The second part ^.*\r?\n matches the first line ending with CrLf.

This works perfect in Javascript. But while running the unit tests in C# I noticed a difference.

Assume we have "line1\n" as input.


The regex in Javascript will match it as follows:

^.*$ won't match because . is any symbol except CrLf and we have \n at the end.

^.*\r?\n will match as we have string starting with 0 or more symbols and \n at the end.


Now in C# it works different:

^.*$ will match (why?), but only the line1. Thus the whole /(^.*$|^.*\r?\n)/ will also match only line1 an the \n goes missing.


Could someone please explain? Is there a way to force C# regex to behave like the Javascript regex in the sense described above?

The simplest workaround would be to change the order in the pattern : /(^.*$|^.*\r?\n)/ -> /(^.*\r?\n|^.*$)/ and so the problem will be solved ...,

but I still would like to know the reason behind that difference.

Click here for the C# test code ...

For Javascript see below:

const first_line_pattern = /(^.*$|^.*\r?\n)/
const single_string_pattern = /^.*$/
const line_pattern = /^.*\r?\n/

const input4 = "line1\n"

function log(pattern) {
  let m4 = input4.match(pattern)
  console.log('~~~~~~~~' + pattern.toString() + '~~~~~~~~~')
  console.log("'line1\\n':':" + (m4 != null) + ":value: /" + (m4 ? m4[0] : 'no match') + "/")
}

log(first_line_pattern)
log(single_string_pattern)
log(line_pattern)

Thank you for your time!

Alexander Mihailov
  • 1,050
  • 1
  • 12
  • 19
  • 1
    Use `\z` to match the very end of string in C# – Wiktor Stribiżew Sep 09 '21 at 11:27
  • 1
    The `.` in C# matches any char but LF (by default), so `.` in JS is roughly `[^\n\r]` in C#. – Wiktor Stribiżew Sep 09 '21 at 11:34
  • `.` didn't cause any problem in my case. I didn't know that `\z` is the right substitute for `$` in Javascript. Thanks a lot! – Alexander Mihailov Sep 09 '21 at 11:40
  • It is fine, I just wanted you to also understand that part. `.` and anchors behavior is interconnected. `/(^.*$|^.*\r?\n)/` works for you in JS, right? And you want to have the same in C#? Since `$` is not working the same and `.` is not the same, you know you need `@"^[^\r\n]*\z|^[^\r\n]*\r?\n"` in C#. – Wiktor Stribiżew Sep 09 '21 at 11:43
  • wait. In Javascript `.` matches any single character except line terminators: \n, \r so what is the difference to C#. Could you give an example? – Alexander Mihailov Sep 09 '21 at 11:52
  • 1
    See my second comment. `.` in C# is `[^\n]`. In JS, it is `[^\r\n]`. And there are no flags or options that could redefine this behavior. – Wiktor Stribiżew Sep 09 '21 at 11:55
  • oh, OK I see now. So if one uses `/(^.*\z|^.*\r?\n)/` the difference would be only that C# would match something like `"\r"`, whereas Javascript will not. Or `"'\r\r\n"` will also be matched in C# , but not in Javascript ... Thank you! – Alexander Mihailov Sep 09 '21 at 12:10

0 Answers0