2

For regular expression \w+\d, in many script language such as perl/python it can be written literally. But in C/C++, I must write it as:

const char *re_str = "\\w+\\d";

which is ugly to eye.

Is there any method to avoid it? MACRO are also acceptable.

zhaorufei
  • 2,045
  • 19
  • 18
  • That is the language syntax for char array literals. The alternative is probably even worse: `'\', 'w', '+', '\', 'd', '\0'` – Amardeep AC9MF Oct 20 '10 at 13:35
  • @Amardeep: Does `'\'` even work? wouldn't that \ be misinterpreted as `\'`? – sbi Oct 20 '10 at 13:44
  • @sbi: You are quite correct. The escaping isn't limited just to double quote literals but affects character literals as well. I should have known since I put the \0 in there for a terminator. :-) – Amardeep AC9MF Oct 20 '10 at 13:53

4 Answers4

10

Just as an FYI, the next C++ standard (C++ 0x) will have something called raw string literals which should let you do something like:

const char *re_str = R"(\w+\d)";

However until then I think you're stuck with the pain of doubling up your backslashes if you want the regex to be a literal in the source file.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Great! Thank you, This is the dream feature for me about C++. But unfortunately It's not usable at present and for C it's even worse. – zhaorufei Oct 21 '10 at 05:09
  • I check it with vc2010, does not support yet: error C3861: 'R': identifier not found – zhaorufei Oct 21 '10 at 05:55
7

When I reading [C: A reference manual] Chapter 3: Prepressors. An idea emerges:

 #define STR(a) #a
 #define R(var, re)  static char var##_[] = STR(re);\
 const char * var = ( var##_[ sizeof(var##_) - 2] = '\0',  (var##_ + 1) );

 R(re, "\w\d");
 printf("Hello, world[%s]\n",  re);

It's portable in both C and C++, only uses standard preprocessing features. The trick is to use macro to expand \ inside liternal string and then remove the leading and tailing double quote strings.

Now I think it's the best way until C++0x really introduce the new literal string syntax R"...". And for C I think it'll be the best way for a long time.

The side effect is that we cannot defined such a variable in the global scope in C. Because there's a statement to remove the tailing double-quote character. In C++ it's OK.

zhaorufei
  • 2,045
  • 19
  • 18
2

You can put your regexp in a file and read the file if you have a lot or need to modify them often. That's the only way I see to avoid backslashes.

Benoit Thiery
  • 6,325
  • 4
  • 22
  • 28
1

No. There is only one kind of string literals in C++, and it's the kind that treats escaped characters.

wilhelmtell
  • 57,473
  • 20
  • 96
  • 131