2

Given a standard string object in C++ that is expected to have escape sequences in it, how can I convert that into a quoted version of itself at runtime?

std::string str("Foo said, \"bar\"\n");

Given the above, I want to create a new string, quoted, that has the following contents:

"\"Foo said, \\\"bar\\\"\\n\""

This is just an example. I need to perform this manipulation with arbitrary strings.

Will I simply have to do this manually for every possible escape sequence?

Chuck
  • 4,662
  • 2
  • 33
  • 55
  • yes. though you only really have to escape `\` and `"`, the rest only for better readability. – Deduplicator Aug 04 '14 at 01:41
  • Just make a function that does this for you. Add `\"` to the start and `\"` to the end. – OMGtechy Aug 04 '14 at 01:50
  • Do you need the string to be readable? It would probably be easier to escape every single character than to try and special case all the ones that actually need escaping. Using the Unicode representations would be technically correct, although totally illegible. – Mike Precup Aug 04 '14 at 01:51
  • Roger Pate's answer [to this question](http://stackoverflow.com/questions/2417588/escaping-a-c-string) is probably the kind of thing you want, if you're just handling ASCII text.... – Tony Delroy Aug 04 '14 at 01:55
  • Shall that be runtime-escaping or compile-time escaping? If you have the choice, prefer the latter. – Deduplicator Aug 04 '14 at 02:00
  • @Mike Precup: It does not need to be human readable. – Chuck Aug 04 '14 at 02:04
  • @Deduplicator: runtime. – Chuck Aug 04 '14 at 02:07
  • @Chuck Looks to me that you'll need an appropriate [`std::regex`](http://en.cppreference.com/w/cpp/regex/basic_regex) to process such at runtime. – πάντα ῥεῖ Aug 04 '14 at 02:16

2 Answers2

6

c++14 has a new std::quoted manipulator, e.g.

std::cout << quoted(str); // defaults to '"' as the quote and '\\' as the escape

std::ostringstream oss;
oss << quoted(str);
auto quoted_string = oss.str();
user657267
  • 20,568
  • 5
  • 58
  • 77
5

If you are dealing with a string literal, you can pass it through a macro and stringify it.

#define STRINGIFY(X) #X
std::string str(STRINGIFY("Foo said, \"bar\"\n"));

If you already have a string text stored, and you want a version of it that you can emit as a string that can be used in C source code to represent the same string, you need to apply your own stringification to the text. This is a little tricky because some of the control characters have their own escaped representations:

std::string hex (int c) {
    std::ostringstream oss;
    oss << std::setw(4) << std::setfill('0') << std::hex << c;
    return oss.str();
}

std::string stringify (const std::string &str) {
    std::ostringstream oss;
    oss << '"';
    for (int i = 0; i < str.size(); ++i) {
        unsigned char c = str[i];
        switch (c) {
        case '\t': oss << "\\t";     break;
        case '\n': oss << "\\n";     break;
        case '\a': oss << "\\a";     break;
        case '\b': oss << "\\b";     break;
        case '\r': oss << "\\r";     break;
        case '\v': oss << "\\v";     break;
        case '\f': oss << "\\f";     break;
        case '"':  oss << "\\\"";    break;
        case '\\': oss << "\\\\";    break;
        default:
            if (std::isprint(c)) oss << c;
            else oss << "\\u" << hex(c);
            break;
        }
    }
    oss << '"';
    return oss.str();
}

You may need to extend this function if you need to escape '?' to avoid trigraphs.

jxh
  • 69,070
  • 8
  • 110
  • 193
  • These are not going to be string literals but supplied by the user. – Chuck Aug 04 '14 at 02:01
  • That's illustrative, though you could take `str` by `const&`, it'd be nice to have octal representation for other control characters, and IMGO `at()` is overkill for such a simple function that could conceivably be called very, very often - not a bad practice to show anyone who needs to look this up on the web though ;-. – Tony Delroy Aug 04 '14 at 02:36
  • @TonyD: Strings are often reference counted and copy on write, so I usually don't worry too much about it, but I have fixed things as you suggested. – jxh Aug 04 '14 at 02:58
  • @jxh: +1 / interesting topic COW - I gather its use in Standard library implementations has fallen by the wayside these last few years... see e.g. [this question](http://stackoverflow.com/questions/12199710/legality-of-cow-stdstring-implementation-in-c11). Cheers. – Tony Delroy Aug 04 '14 at 04:47
  • Your escaping to `\x` doesn't work. Consider what happens if the string contains a null character followed by an `'a'`; your code would insert a new line in place of the two characters. (Escaping with `\x` is inherently unsafe, since the length of the sequence isn't bound. I generally use `\u`, with exactly 4 hex digits; an octal escape with exactly three octal digits also works, since three digits is the upper limit for octal escapes.) – James Kanze Aug 04 '14 at 08:21
  • @jxh C++11 for some reason bans reference counting; it has always been very difficult to specify iterator guarantees with reference counting, and COW is very difficult to get both efficient and right in a multithreaded environment. (G++ was the last to use COW, I think, and it had a subtle bug in multithreading.) – James Kanze Aug 04 '14 at 08:24
  • @JamesKanze: Thanks for pointing out the bug with the hex escape. COW semantics is one of the reasons I will roll my own data structures for a project. – jxh Aug 04 '14 at 09:08
  • @jxh Been there, done that---my pre-standard string class used copy on write. And it would have been trivial to make it thread safe, and provide logical iterator guarantees, because its interface was designed to make it simple (and IMHO, simpler to use as well). – James Kanze Aug 04 '14 at 09:14
  • @jxh And while you're at it: you're call to `isprint` also has undefined behavior: you need to convert `c` to an `unsigned char`. – James Kanze Aug 04 '14 at 09:16