0

I have a question about compile-time vs. runtime string literals in C++. At compile time, creating a string literal and passing (for example) to a regexp processor

  std::string pattern = "a\\d+"

results in a "post-compiled" literal sequence of chars 'a' '\' 'd' '+' '\0'

At runtime, a user provides this same "literal string" of characters through a command-line interface, by typing (for example)

   set pattern 'a\\d+'

and this result in the "literal" sequence of chars 'a' '\' '\' 'd' '+' '\0'

Is there any way to use or leverage the mechanism C++ uses to convert "compile-time" representations of strings into the actual C++ string representation where escapes have been processed correctly?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 2
    why you want the user to escape the string? – apple apple Aug 23 '21 at 17:12
  • FWIW, Python has a special literal format for regular expressions r"a\d+" This is translated into your regular expression without the need to escape the \ – Mark Lavin Aug 23 '21 at 17:14
  • Escape sequences are only processed at compile-time, there is no need to enter an escaped string at runtime (unless the terminal requires it, for instance): `std::string pattern = "a\\d+"` and `set pattern 'a\d+'` will produce the same character sequence `'a' '\' 'd' '+' '\0'` – Remy Lebeau Aug 23 '21 at 17:18
  • 1
    @MarkLavin C++ has that as well (since C++11). See "Raw string" at https://en.cppreference.com/w/cpp/language/string_literal Though, it's obviously compile-time only because C++. – 3Dave Aug 23 '21 at 17:48
  • Users shouldn't have to be concerned with language-level details like escape sequences. The user should type `a\d+` and your program should deal with it (which it will do correctly unless you go out of your way to "help" your users) – Pete Becker Aug 23 '21 at 18:16
  • `At runtime, a user provides this same "literal string"`. Be careful of your words. A `string literal` has a specific meaning. The runtime user does not input `string literal` they enter a `string`. Why would a user inputting a string want to escape them in the first place. Users inputting `\d` on the command line in any other command line tools would not type `\\d` see grep/sed/awk for examples – Martin York Aug 23 '21 at 19:19
  • 1
    C++ string literals have quite complex syntax. There are single-character escape sequences, octal escape sequences, hexadecimal escape sequences, universal character names, raw strings, all interacting with each other in non-trivial ways. There is no standard library functions to interpret this syntax, because it is complex and not generally useful outside of writing C++ code. OTOH a simplified syntax can be done in a few lines of code, but everyone needs a slightly different simplified syntax, so no standard function here either. Write your own. – n. m. could be an AI Aug 23 '21 at 19:55
  • @MartinYork "Why would a user inputting a string want to escape them in the first place". Why would a C++ programmer want to escape a character in a string literal? Now imagine a small (or large) programming language interpreter written in C++. – n. m. could be an AI Aug 23 '21 at 19:57
  • @PeteBecker "The user should type a\d+ and your program should deal with it" The user *of the C++ compiler* should type `a\d+` and *the C++ compiler* should deal with it, right? The C++ compiler is somebody's program too. – n. m. could be an AI Aug 23 '21 at 20:01
  • @n.1.8e9-where's-my-sharem. Thanks. I was coming to that conclusion myself, but I was hoping there was some posix or gnu function that I didn't know about. We are hemmed in by user expectations here and I was trying to avoid custom code. – Sam Appleton Aug 23 '21 at 20:37
  • @n.1.8e9-where's-my-sharem. My point of reference here for user behavior is the TCL interpreter. TCL does it's own escape sequence interpolation for quoted strings "" (TclParseBackslash), but does no interpolation on literal strings provided by the user. – Sam Appleton Aug 23 '21 at 21:11
  • 1
    @n.1.8e9-where's-my-sharem. -- C++ has raw literals; if you like them, use them. – Pete Becker Aug 23 '21 at 22:15
  • 1
    You are misusing the term "literal". In C++, a [literal](https://en.cppreference.com/w/cpp/language/expressions#Literals) is a constant value **embedded in the source code**. (While "string literal" is the most common use of this term, it also applies to other types, such as a "floating-point literal".) You've misused the term when referring to things not embedded in the source code. In particular, there is no way for a user to provide a literal at runtime. I think your question is good, but your misuse of terminology makes it look less so. – JaMiT Aug 23 '21 at 22:31
  • @JaMIT thanks for your comment. I struggled with understanding the correct terminology here and tried to pose the question in a way that would help get an answer. – Sam Appleton Aug 23 '21 at 23:35
  • Does this answer your question? [Convert string with explicit escape sequence into relative character](https://stackoverflow.com/questions/5612182/convert-string-with-explicit-escape-sequence-into-relative-character) – xskxzr Aug 24 '21 at 03:36

1 Answers1

1

The mechanism C++ uses to process escape sequences in character and string literals is Phase 5 of the compilation sequence. This is built into the compiler chain and is not available within C++.

If you step outside of C++, most operating systems provide a way to invoke external commands from within a program. If that is available, and if your user's compiler (exists and) offers the option to perform this single phase of translation on a source file, then it should be possible to read a string from the user, invoke the external program and collect the result. I am not aware of a compiler offering this option, but it is theoretically possible.

On the other hand, that seems like a lot of work for a simple task. There are eleven simple escape sequences. A fairly simple scan-and-replace could handle them. The numeric and universal escape sequences would take a bit more work, but still not hugely difficult (especially if you decide that your user does not need all of the options available to a C++ programmer). At this point, we seem to be entering the realm of Convert string with explicit escape sequence into relative character, though, so I will refer you to that question for more information.

JaMiT
  • 14,422
  • 4
  • 15
  • 31