0

I'm writing a regex to parse markdown admonitions. Here's my test data:

!!! hello
This is a test
with multiple lines
!!!

However, instead of capturing "hello", my code seems to be capturing "hello\r\n" and for the life of me I can't figure out why.

Here's some simplified PHP code exhibiting the issue I'm encountering:

preg_replace('/^!!! (.*)/m', '|$1|', $content);

This replaces the first line of the test data with:

|hello
|

Where the second pipe is on the next line.

When I modify this regex like so:

preg_replace('/^!!! ([a-z]*)/m', '|$1|', $content);

Then I get the expected output for the first line:

|hello|

In multiline mode, the . character is not supposed to match line breaks -- so where is the line break coming from in my capture group that uses the dot?

A few other things I tried:

preg_replace('/^!!! (.*o)/m', '|$1|', $content);

works as expected.

preg_replace('/^!!! (.*)$/m', '|$1|', $content);

still contains the line break, even though I thought $ should be matching before the line break.

I've been trying to figure this out for hours and am clearly missing something (probably obvious). I've also scoured the internet and stackoverflow looking for clues but have come up empty.

Any and all help appreciated. Thanks!

Tarindel
  • 96
  • 5
  • Can't reproduce - https://3v4l.org/g39nA Maybe you should check what characters/byte sequences your input data actually contains, it appears to be something other than you thought. – CBroe Sep 21 '21 at 07:14
  • 1
    Okay, _can_ reproduce, if the line breaks are actually not just `\n`, but `\r\n`. – CBroe Sep 21 '21 at 07:16
  • That seems like a great clue @CBroe. Thanks for sharing. I was able to repro with your same snippet here: [http://sandbox.onlinephpfunctions.com/code/34fda19b70b2e32b89224dceb08d704c8ce065f2](http://sandbox.onlinephpfunctions.com/code/34fda19b70b2e32b89224dceb08d704c8ce065f2) – Tarindel Sep 21 '21 at 07:30

1 Answers1

0

@CBroe gave me the key to unlock this mystery. With the line endings anomaly in hand, I found the following stackoverflow thread: How to change what PCRE regexp thinks are newlines in multi-line mode?. That led to the following regex, which does work for me: /(*ANYCRLF)^!!! (.*)/m

Tarindel
  • 96
  • 5