7

Using std::regex and given a file path, I want to match only the filenames that end with .txt and that are not of the form _test.txt or .txtTEMP. Any other underscore is fine.

So, for example:

  • somepath/testFile.txt should match.
  • somepath/test_File.txt should match.
  • somepath/testFile_test.txt should not match.
  • somepath/testFile.txtTEMP should not match.

What is the correct regex for such a pattern?

What I have tried:

(.*?)(\.txt) ---> This matches any file path ending with .txt.

To exclude files that contains _test I tried to use negative lookahed:

(.*?)(?!_test)(\.txt)

But it didn't work.

I also tried negative lookbehind but MSVC14 (Visual Studio 2015) throws a std::regex_error exception when creating the regex, so I'm not sure if it's not supported or I'm using the wrong syntax.

Banex
  • 2,890
  • 3
  • 28
  • 38

4 Answers4

2

based on what you posted, use this pattern

^(?!.*_).*\.txt$

Demo


or this pattern based on OP edit

^(.*(?<!_test)\.txt$)

Demo

alpha bravo
  • 7,838
  • 1
  • 19
  • 23
2
^(?!.*?_test\.).*\.txt$

I do not have access to VS 2015 atm, but this only uses lookahead, so should work.

Alexander Balabin
  • 2,055
  • 11
  • 13
  • Very strictly speaking: it does not allow '_test.file.txt' - which according to the specs should probably be allowed. However, this is almost certainly the best solution for the practical case. – Jan de Vos Jul 27 '15 at 17:07
  • I suppose I could've added `txt` to the lookahead to address that. – Alexander Balabin Jul 27 '15 at 17:11
1

Best bet? Don't use regexes. Particularly in a simplistic string search case like this one.

First there are a couple simple optimizations that can be made given the question's parameters:

  1. Since the input string's extension must be: ".txt" we don't need to check if the extension is ".txtTEMP"
  2. The only don't match condition then, where the input string ends in "_test.txt", requires checking that the stem ends in "_test" since the extension is already known to be: ".txt"

Both of these checks are always going to be offset a fixed number of characters from the end of the input string. Since all the information for both of these expressions is known it should be setup at compile time:

constexpr auto doMatch = ".txt";
constexpr auto doMatchSize = strlen(doMatch);
constexpr auto doNotMatch = "_test";
constexpr auto doNotMatchSize = strlen(doNotMatch) + doMatchSize;

Given string input it could be tested for success as follows:

if(input.size() >= doMatchSize &&
   equal(input.end() - doMatchSize, input.end(), doMatch) &&
   (input.size() < doNotMatchSize ||
   !equal(input.end() - doNotMatchSize, input.end() - doMatchSize, doNotMatch)))

You can see a live example here: http://ideone.com/7BcyFi

Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
0

One trick to emulate the lookbehind that you would really want (but is unfortunately not supported in C++11), is to reverse the string, then use a lookahead. Your regexp would become something like

^txt\.(?!tset_).*

The problem with the lookahead you tried is that it applies to the position where it should also start matching the '.txt.' part. So the part '(?!_test)(.txt)' of your regexp says 'I want something that does not start with _test, but does match .txt'. Anything ending in .txt will actually match that, which is why it does not work.

Update: a regex with negative lookbehind (that will NOT work in c++, but works in for instance python):

^.*(?<!_test)\.txt$
Jan de Vos
  • 3,778
  • 1
  • 20
  • 16
  • Thank you for your explanation. I tried lookbehind because I realized lookahead would not work, as you confirmed. Could you please include in your answer the correct regex using lookbehind? I'll check if it works in VS2015 (maybe it was my syntax being wrong), and anyway it could be useful for other regex implementations. – Banex Jul 27 '15 at 16:47
  • @Banex: unfortunately, there is no correct syntax for the lookbehind - it's not supported in the regex dialect mandated by C++11. See also this question: http://stackoverflow.com/questions/14538687/using-regex-lookbehinds-in-c11 – Jan de Vos Jul 27 '15 at 16:53