1

say I have a text, represented as std::string, which contains several different newline, e.g. \r\n but also just \n or even just \r.

I would like now to unify this by replacing all non \r\n newlines, namely all \r and all \n newlines with \r\n.

A simple boost::replace_all(text, "\n", "\r\n"); doesn't work unfortunatly because that would also replace the \n within the already valid \r\n's.

I think std::regex should be a good way to handle this... but how should I express this in a regex? Here is some code:

#include <iostream>
#include <string>
#include <regex>

int main()
{
    std::string text = "a\rb\nc\r\nd\n";
    std::regex reg(""); // What to put here?
    text = std::regex_replace(text, reg, "\r\n");
    std::cout << text;
}

The text should at the end just be "aaa\r\nbbb\r\nccc\r\nddd\r\n"

SampleTime
  • 291
  • 3
  • 19

4 Answers4

2

To swap "\n" with no preceding "\r" you can actually use a look ahead:

std::regex_replace("\n\n\n\n\n", std::regex("[^\r](?=\n)"), "$1\r\n");

This cannot handle the the last new line of a file, so you would need another operation.

To swap "\r" with no following "\n" is a bit easier:

std::regex_replace(text, std::regex("\r[^\n]"), "\r\n");

Note depending on the c++ regexp flavor good chance you can't support look behinds if you're considering it.

kabanus
  • 24,623
  • 6
  • 41
  • 74
  • 1
    The code can't compile. I wrap the "([^\r])\n" with std::regex(). It can compile, but it can't deal correctly with "\n\n". the code translate "\n\n" into "\r\n\n". – HarryLeong Mar 01 '21 at 00:10
  • @HarryLeong 4 years later, and finally someone shouts the king is naked! Thanks, you're right on all counts - makes me wonder who upvotes these answers. I'll amend it. – kabanus Mar 01 '21 at 06:25
  • Thank you for your reply. The upvotes indeed made me doubt myself until your confirmation. – HarryLeong Mar 01 '21 at 06:59
2

You could do that in two steps:

  1. \n -> \r\n
  2. \r\r\n -> \r\n

or in one step:

(?:\r\n|\n|\r) -> \r\n

#include <iostream>
#include <string>
#include <regex>

int main()
{
    std::string text = "a\rb\nc\r\nd\n";
    text = std::regex_replace(text, std::regex("(?:\\r\\n|\\n|\\r)"), "\r\n");
    std::cout << text;
}
Pavel P
  • 15,789
  • 11
  • 79
  • 128
2
std::regex_replace(text, reg, "\r\n|\r|\n");

should match.

More info here:

Match linebreaks - \n or \r\n?

Community
  • 1
  • 1
didiz
  • 1,069
  • 13
  • 26
  • 1
    A better pattern is `\r\n?|\n`. Alternatives should be matching in different locations as a best practice. – Wiktor Stribiżew May 06 '17 at 16:43
  • @WiktorStribiżew: "Better" by what metric? What does "Alternatives should be matching in different locations" mean? – Lightness Races in Orbit May 06 '17 at 16:56
  • 1
    Better performance, surely not the looks. It's hard to type on a cellphone. I meant to write: "the best practice is to write alternatives in such a way that they could not match at the same locatIon in the string." That way you get rid of unnecessary backtracking. – Wiktor Stribiżew May 06 '17 at 17:34
1

\R stands for any kind of linebreak, ie.: \n or \r or \r\n

Toto
  • 89,455
  • 62
  • 89
  • 125