6

I'm string to create a std::regex(__FILE__) as part of a unit test which checks some exception output that prints the file name.

On Windows it fails with:

regex_error(error_escape): The expression contained an invalid escaped character, or a trailing escape.

because the __FILE__ macro expansion contains un-escaped backslashes.

Is there a more elegant way to escape the backslashes than to loop through the resulting string (i.e. with a std algorithm or some std::string function)?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Nicolas Holthaus
  • 7,763
  • 4
  • 42
  • 97
  • `__FILE__` should only print the filename. do you need the full path? – Hayt Aug 30 '16 at 13:34
  • 2
    @Hayt _"`__FILE__` should only print the filename."_ Not necessarily – πάντα ῥεῖ Aug 30 '16 at 13:35
  • yeah if he does not need them he can look that up here: https://msdn.microsoft.com/en-us/library/027c4t2s.aspx assuming the problem is not the missing quotation marks, which you have already answered. And assuming he uses MSVC compiler – Hayt Aug 30 '16 at 13:37
  • @Hayt in this case the full paths are desirable – Nicolas Holthaus Aug 30 '16 at 13:40
  • Then you most likely need to convert them manually. It's just how microsoft's paths work. – Hayt Aug 30 '16 at 13:41
  • Look here for some solutions: http://stackoverflow.com/questions/1494399/how-do-i-search-find-and-replace-in-a-standard-string – Hayt Aug 30 '16 at 13:42
  • @Hayt I guess that's what I was trying to get at with the question: is there a modern/elegant way of doing so. – Nicolas Holthaus Aug 30 '16 at 13:42
  • In the link I posted. either way implement it yourself or use `boost::replace_all` – Hayt Aug 30 '16 at 13:43
  • 1
    @NicolasHolthaus Maybe [std::transform()](http://en.cppreference.com/w/cpp/algorithm/transform) plus a lambda function could be helpful to write it in an _elegant way_. – πάντα ῥεῖ Aug 30 '16 at 13:47
  • ah didn't know about transform. definitively an elegant native-c++ way of doing this. – Hayt Aug 30 '16 at 13:48
  • If you use `/` *forward slash* in your build then `__FILE__` should contain forward slashes. (not able to test that theory). But I think it is supposed to contain whatever path was passed to the preprocessor. (and Windows accepts `/`). – Galik Aug 30 '16 at 13:48
  • @Galik not if you set the compiler to explicitly print full paths in __FILE__. – Hayt Aug 30 '16 at 13:49
  • @πάνταῥεῖ I like the `std::transform` idea, just not sure how exactly that would work since `\ ` is a `char` and `\\ ` is a string. – Nicolas Holthaus Aug 30 '16 at 13:50
  • you can replace the \ with a / – Hayt Aug 30 '16 at 13:54
  • @Hayt then the unit test will fail since printing the backslash is the expected/desired behavior. – Nicolas Holthaus Aug 30 '16 at 13:55
  • 1
    maybe it's just best then to write your own function then which goes through the string char by char and copies it and when it finds a \ add another one. – Hayt Aug 30 '16 at 13:56
  • @Hayt yeah I think I agree, it's just amusing/annoying that this isn't simpler with c++11. – Nicolas Holthaus Aug 30 '16 at 13:57
  • @NicolasHolthaus There is probably an algorithm which can do this somehow but I guess it will end up looking more complicated in the end. – Hayt Aug 30 '16 at 14:00
  • Why are you trying to use a file path as a regex? o.O – Lightness Races in Orbit Aug 30 '16 at 14:18
  • @LightnessRacesinOrbit because it's a simple way to check for the expected output, and the test already has a ton of other regexs to check for proper timestamps, formatting, etc. – Nicolas Holthaus Aug 30 '16 at 14:20
  • @NicolasHolthaus: Apparently not so simple ;) Honestly there are so many ways this can go wrong. Fortunately, you can do this properly-ish: http://stackoverflow.com/a/1253004/560648 – Lightness Races in Orbit Aug 30 '16 at 15:24
  • How are you using the regex to check the output? If you're just trying to compare __FILE__ to a literal string, then why not just compare the strings? – Adrian McCarthy Aug 30 '16 at 20:35
  • @AdrianMcCarthy In my real program it's just a component of a larger, more complex regex. – Nicolas Holthaus Aug 30 '16 at 20:38
  • File paths can contain several characters, besides backslashes, that have special meaning when part of a regular expression pattern: hyphens, braces, parentheses. You need to escape all of these. – Adrian McCarthy Aug 30 '16 at 20:43
  • @AdrianMcCarthy in this case I don't care because it's just for a unit test and the paths are well defined. The actual production code doesn't use `__FILE__` as a regex. – Nicolas Holthaus Aug 30 '16 at 20:44

3 Answers3

4

File paths can contain many characters that have special meaning in regular expression patterns. Escaping just the backslashes is not enough for robust checking in the general case.

Even a simple path, like C:\Program Files (x86)\Vendor\Product\app.exe, contains several special characters. If you want to turn that into a regular expression (or part of a regular expression), you would need to escape not only the backslashes but also the parentheses and the period (dot).

Fortunately, we can solve our regular expression problem with more regular expressions:

std::string EscapeForRegularExpression(const std::string &s) {
  static const std::regex metacharacters(R"([\.\^\$\-\+\(\)\[\]\{\}\|\?\*)");
  return std::regex_replace(s, metacharacters, "\\$&");
}

(File paths can't contain * or ?, but I've included them to keep the function general.)

If you don't abide by the "no raw loops" guideline, a probably faster implementation would avoid regular expressions:

std::string EscapeForRegularExpression(const std::string &s) {
  static const char metacharacters[] = R"(\.^$-+()[]{}|?*)";
  std::string out;
  out.reserve(s.size());
  for (auto ch : s) {
    if (std::strchr(metacharacters, ch))
      out.push_back('\\');
    out.push_back(ch);
  }
  return out;
}

Although the loop adds some clutter, this approach allows us to drop a level of escaping on the definition of metacharacters, which is a readability win over the regex version.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • @Nicolas Holthaus: Sean Parent of Adobe proposes the "no raw loops" idea in this video: https://channel9.msdn.com/Events/GoingNative/2013/Cpp-Seasoning – Adrian McCarthy Sep 01 '16 at 17:50
1

EDIT

In the end, I switched to @AdrianMcCarthy 's more robust approach.


Here's the inelegant method in which I solved the problem in case someone stumbles on this actually looking for a workaround:

std::string escapeBackslashes(const std::string& s)
{
    std::string out;

    for (auto c : s)
    {
        out += c; 
        if (c == '\\') 
            out += c;
    }

    return out;
}

and then

std::regex(escapeBackslashes(__FILE__));

It's O(N) which is probably as good as you can do here, but involves a lot of string copying which I'd like to think isn't strictly necessary.

Community
  • 1
  • 1
Nicolas Holthaus
  • 7,763
  • 4
  • 42
  • 97
  • All this does is escape the backslashes, which is insufficient for transforming a Windows file path into a valid regular expression pattern. It doesn't do anything with other regular expression meta characters that can be in path names, like parentheses. – Adrian McCarthy Aug 30 '16 at 20:40
  • @AdrianMcCarthy sure, but that's all it was intended to do. It was meant for a unit test, not as a general purpose `regex` maker, and solved the one and only one problem I needed it to. – Nicolas Holthaus Aug 30 '16 at 20:43
1

Here is polymapper.

It takes an operation that takes and element and returns a range, the "map operation".

It produces a function object that takes a container, and applies the "map operation" to each element. It returns the same type as the container, where each element has been expanded/contracted by the "map operation".

template<class Op>
auto polymapper( Op&& op ) {
  return [op=std::forward<Op>(op)](auto&& r) {
    using std::begin;
    using R=std::decay_t<decltype(r)>;
    using iterator = decltype( begin(r) );
    using T = typename std::iterator_traits<iterator>::value_type;
    std::vector<T> data;
    for (auto&& e:decltype(r)(r)) {
      for (auto&& out:op(e)) {
        data.push_back(out);
      }
    }
    return R{ data.begin(), data.end() };
  };
}

Here is escape_stuff:

auto escape_stuff = polymapper([](char c)->std::vector<char> {
  if (c != '\\') return {c};
  else return {c,c};
});

live example.

int main() {
  std::cout << escape_stuff(std::string(__FILE__)) << "\n";
}

The advantage of this approach is that the action of messing with the guts of the container is factored out. You write code that messes with the characters or elements, and the overall logic is not your problem.

The disadvantage is polymapper is a bit strange, and needless memory allocations are done. (Those could be optimized out, but that makes the code more convoluted).

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524