3

I have a text file that has the literal string \r\n in it. I want to replace this with an actual line break (\n).

I know that the regex /\\r\\n/ should match it (I have tested it in Reggy), but I cannot get it to work in PHP.

I have tried the following variations:

preg_replace("/\\\\r\\\\n/", "\n", $line);

preg_replace("/\\\\[r]\\\\[n]/", "\n", $line);

preg_replace("/[\\\\][r][\\\\][n]/", "\n", $line);

preg_replace("/[\\\\]r[\\\\]n/", "\n", $line);

If I just try to replace the backslash, it works properly. As soon as I add an r, it finds no matches.

The file I am reading is encoded as UTF-16.

Edit:

I have also already tried using str_replace().

I now believe that the problem here is the character encoding of the file. I tried the following, and it did work:

$testString = "\\r\\n";
echo preg_replace("/\\\\r\\\\n/", "\n", $testString);

but it does not work on lines I am reading in from my file.

Katfish
  • 698
  • 7
  • 13

5 Answers5

5

Save yourself the effort of figuring out the regex and try str_replace() instead:

str_replace('\r\n', "\n", $string);
Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390
  • I should have mentioned in my original question that that does not work either. I'll add that in now. – Katfish Aug 17 '11 at 19:45
  • 3
    @Katfish Use single quotes instead of double. `str_replace('\r\n', "\n", $string)` – Wiseguy Aug 17 '11 at 19:46
  • @Wiseguy I read that on http://www.regular-expressions.info/php.html, but it did not make a difference in any of my tests. I have tried it both ways using str_replace and preg_replace. – Katfish Aug 17 '11 at 19:49
4

Save yourself the effort of figuring out the regex and the escaping within double quotes:

$fixed = str_replace('\r\n', "\n", $line);

For what it is worth, preg_replace("/\\\\r\\\\n/", "\n", $line); should be fine. As a demonstration:

var_dump(preg_replace("/\\\\r\\\\n/", "NL", 'Cake is yummy\r\n\r\n'));

Gives: string(17) "Cake is yummyNLNL"

Also fine is: '/\\\r\\\n/' and '/\\\\r\\\\n/'

Important - if the above doesn't work, are you even sure literal \r\n is what you're trying to match?..

salathe
  • 51,324
  • 12
  • 104
  • 132
  • It is definitely what I am trying to match, but I now suspect that the r and n may not be the same r and n that PHP is using. My file uses UTF-16. – Katfish Aug 17 '11 at 19:56
2

UTF-16 is the problem. If you're just working with raw the bytes, then you can use the full sequences for replacing:

$out = str_replace("\x00\x5c\x00\x72\x00\x5c\x00\x6e", "\x00\x0a", $in);

This assumes big-endian UTF-16, else swap the zero bytes to come after the non zeros:

$out = str_replace("\x5c\x00\x72\x00\x5c\x00\x6e\x00", "\x0a\x00", $in);

If that doesn't work, please post a byte-dump of your input file so we can see what it actually contains.

Cal
  • 7,067
  • 25
  • 28
  • That worked perfectly. Thanks! Also, can you tell me where you got the byte values for UTF-16? I failed to find it when I searched earlier. – Katfish Aug 17 '11 at 20:02
  • 2
    If you ignore surrogate pairs, UTF-16 just takes `U+abcd` and encodes it as the 2 bytes `\xab\xcd`. The codes are then just the ASCII bytes for backslash (x5c), 'r' (x72) and 'n' (x6e). 0x0a is the newline you wanted to replace them with – Cal Aug 17 '11 at 20:05
2
$result = preg_replace('/\\\\r\\\\n/', '\n', $subject);

The regex above replaces the type of line break normally used on windows (\r\n) with linux line breaks (\n).

References:

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
1

I always keep searching for this topic, and I always come back to a personal line I wrote.

It looks neat and its based on RegEx:

 "/[\n\r]/"

PHP

 preg_replace("/[\n\r]/",'\n', $string )

or

 preg_replace("/[\n\r]/",$replaceStr, $string )
  • 1
    You should (re)read the question! They want to replace **literal string** `\r\n` with a linefeed `\n`. Your regex repaces `\n` or `\r` with `\n` – Toto Jun 02 '20 at 11:03