1

In different text files (or also inside single text file) I have different end-of-lines combinations (see example below).

How to uniform all combinations of CR and LF with one simple CRLF? In a few words I need to replace every CR and LF combinations with one single CRLF using PHP. With str_replace I can replace them but my issue is the right search string to use.

$textfile=str_replace("search string i need","CRLF to replace", $textfile);

Example of a generic text file to fix:

text line 1 CRLFLFCRCRLF
text line 2 LFLFCRLFCRCR
text line 3 CRLF
text line 4 CR
text line 5 LF

I need to replace all the \r \n random combinations with only \r\n like this:

text line 1 CRLF
text line 2 CRLF
text line 3 CRLF
text line 4 CRLF
text line 5 CRLF
  • 1
    Welcome to SO! Please read [this guide](http://stackoverflow.com/help/how-to-ask) on how a question should be formulated for your increasing your chances of getting an answer you can use. As it is right now, there are too many different good answers as the question is too broad. You could narrow it down by including what you have tried yourself. – Demitrian Jan 02 '17 at 22:40
  • 1
    Possible duplicate of [How to replace different newline styles in PHP the smartest way?](http://stackoverflow.com/questions/7836632/how-to-replace-different-newline-styles-in-php-the-smartest-way) – Jonathan Argentiero Jan 03 '17 at 13:30
  • I modified the question. Is now better? Sorry, it's my 1st question, next will be better. Thanks for understanding me. – Kevin White Jan 05 '17 at 18:51

3 Answers3

4

PCRE has an alias for any newline combination: \R

You can do that:

$text = preg_replace('~\R~', "\r\n", $text);

In 8 bit mode, \R matches CR, LF, or CRLF, but also the vertical tabulation (VT), the form feed (FF) and the next line character (NEL).
In other words, \R is an alias for (?>\r\n|\n|\x0b|\f|\r|\x85). But since VT, FF and NEL are rarely (never?) used todays... However, it's possible to restrict \R to only CR, LF and CRLF using (*BSR_ANYCRLF) at the start of the pattern:

$text = preg_replace('~(*BSR_ANYCRLF)\R~', "\r\n", $text);

if you want to extend the meaning of \R to any unicode newline sequences, use the u modifier:

$text = preg_replace('~\R~u', "\r\n", $text);

Concretly it adds the Line Separator U+2028 and the Paragraph Separator U+2029 to the list of newline sequences.

Take care that \R is an alias and not a shorthand character class. So you can't put it inside a character class.


With intl transliterator.

It can be interesting to use the intl transliterator instead of a simple replacement function with regex or not, in particular if you need to include other modifications to your strings. All of them can be centralized in a unique set of rules:

$tls = Transliterator::createFromRules('[\r\n]+ > \r\n;');
$text = $tls->transliterate($text);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
2

To replace all combinations of '\r\n' with '\r\n' use:

$result = preg_replace('/[\r\n]+/', "\r\n", $text);

This will also replace single '\r' or '\n' with '\r\n'.

Andie2302
  • 4,825
  • 4
  • 24
  • 43
  • I recently started running into issues when parsing raw emails. This replace seems to be fixing all cases that caused issues for me, while the others in this page only solved some. – Arie Jun 29 '22 at 14:50
1

You don't really need regex for that:

str_replace(["\r\n", "\r", "\n"], "\r\n", $str);

This will replace every one of the strings in the array (and keep the order, so if you have \r\n it will not be replaced to \r\n\r\n).

Dekel
  • 60,707
  • 10
  • 101
  • 129
  • Parse error: syntax error, unexpected '[', expecting ')' – Kevin White Jan 02 '17 at 23:04
  • Which version of PHP are you using? (for old versions use `Array("\r\n", "\r", "\n")` instead of `["\r\n", "\r", "\n"]` ) – Dekel Jan 02 '17 at 23:04
  • Now the error is fixed but the CRLF are multiplied instead of fixed. Here is the code where I use your suggested string: $str1=file_get_contents($_FILES['uploaded']['tmp_name']); $str2=str_replace(Array("\r\n", "\r", "\n"), "\r\n", $str1); file_put_contents($_FILES['uploaded']['tmp_name'], $str1); Where do I mistake? – Kevin White Jan 02 '17 at 23:22
  • You didn't use `$str2` after you put the changes inside. – Dekel Jan 02 '17 at 23:24
  • Thanks for your answer. I followed your suggestion and the string I used is: $str1 = str_replace(array("\n", "\r"), "
    ", $str1); Unfortunately it replaces the
    multiple times each end of file, while I need only one
    each eand of line. How can modify?
    – Kevin White Jan 05 '17 at 17:19
  • Not sure what is the data you have so it's a bit hard to help. – Dekel Jan 05 '17 at 18:40
  • Dekel if you read again the question (I've modified a few minutes ago) maybe is more clear. Is it? – Kevin White Jan 05 '17 at 18:48
  • In such case you will have to use regex. You can use `preg_replace('#(\R+)#u', "\r\n", $text);` it should work. – Dekel Jan 05 '17 at 18:57