I am processing a multiple-line string, with Unix (\n) line breaks.
Some of its lines have the form "A, a" (i.e. upper-case letter, comma, space, lower-case letter), and I want to delete those from the string.
I can accomplish this with a regex replacement, but there is a mystery that I don't understand:
A regex that uses "[A-Z]" and "[a-z]" works in both normal mode and multiple-line mode.
A regex that uses "\p{Lu}" and "\p{Ll}" works, but only in normal mode, NOT in multiple-line mode.
EACH OF THESE SUCCEEDS:
$all =~ s/\n\K *[A-Z], [a-z]\n//g; # 1
$all =~ s/^ *[A-Z], [a-z]\n//mg; # 2
$all =~ s/\n\K *\p{Lu}, \p{Ll}\n//g; # 3
BUT THIS FAILS:
$all =~ s/^ *\p{Lu}, \p{Ll}\n//mg; # 4
I expected the /m switch to change the meaning of "^" in the regex, but nothing else. So, I expected statement 4 to work, just like statements 1, 2, and 3. Statement 2 seems to show that the multiple-line syntax is OK, and Statement 3 seems to show that the Unicode character properties match as expected, so, when I combine these, I expect statement 4 to work.
I have looked at Tom Christensen's answer Why does modern Perl avoid UTF-8 by default?, but I don't see anything there about multiple-line regex matching, nor have I found an answer elsewhere.