2

I have this pattern for my command-line program:
^s?([/|@#])(?:(?!\1).)+\1(?:(?!\1).)*\1(?:(?:gi?|ig)?(?:\1\d\d?)?|i)?$
based on ECMAScript 262 for C++.

This is a special pattern to check if the user have entered a correct command or not. It is a test against a string like this:
optional-s/one-or-more/anything/optional-g-or-i/optional-2-digits

Here is my previous question why I need this pattern.
Although it works fine on Linux, but does not work on Windows. Also I know about line-break on the two machines and I have read this: How are \n and \r handled differently on Linux and Windows?

My program does work with any files, it only gets the first argument of the command-line argv[ 1 ] and the std::regex_match tests if the entered-user-synopsis is correct or not.
Like: ./program 's/one/two/' *.txt that simply renames one to two for all txt files

the C++ code:

std::string argv_1 = argv[ 1 ]; // => s/one/two/
bool rename_is_correct =
std::regex_match( argv_1, std::basic_regex< char >
( "s?([/|@#])(?:(?!\\1).)+\\1(?:(?!\\1).)*\\1(?:(?:gi?|ig)?(?:\\1-?[1-9]\\d?)?|i)?" ) );

The Problem:
Although the pattern is non-greedy; on Windows it becomes greedy and matches more then 4 delimiters. Therefore it should not match /one/two/three/four/five/ but this string is matched!


NOTE:

  • I deliberately have dropped ^ and $ assertions since in the C++ regex the std::regex_match by default has them and it no need to use them
  • Also the two backslashes \\; one of them is escape character
  • javescript code says no

const regex = /^s?([/|@#])(?:(?!\1).)+\1(?:(?!\1).)*\1((?:gi?|gi)\1-?[1-9]\d|i)?$/gm;
var str = 's/one/two/gi/-33/';
if( str.match( regex ) ){
  console.log( "okay" );
} else {
  console.log( "no" );
}
  • Perl also says no, as you can see in the screenshot, but c++ says okay

enter image description here

Does someone know why it becomes greedy?

Thanks.

Community
  • 1
  • 1
Shakiba Moshiri
  • 21,040
  • 2
  • 34
  • 44
  • on Windows but what compiler? – phuclv Mar 06 '17 at 14:32
  • both `g++` and the same options – Shakiba Moshiri Mar 06 '17 at 14:32
  • GCC 6.3 on Unix - https://ideone.com/lKEh1S - says "no". Looks like the GCC must be updated on your machine. – Wiktor Stribiżew Mar 06 '17 at 15:10
  • 1
    You know, the only "tangible" advice I have seen in these cases is "switch" to Boost regex library. – Wiktor Stribiżew Mar 06 '17 at 15:21
  • @WiktorStribiżew Oh I tested it and you made a mistake the output of `gcc 6.3.0` is correct and say **no** ..... I am familiar with *boost** and my program works fine on Linux but I wanted to test it on Windows since it uses **standard library** – Shakiba Moshiri Mar 06 '17 at 15:25
  • @WiktorStribiżew [You can see here](http://melpon.org/wandbox/permlink/7JDG0GLu4476V7Cv) with `gcc 6.3.0` and says **no** but if you take away the last delimiter `/` then it say **okay** – Shakiba Moshiri Mar 06 '17 at 15:28
  • That string you tested just [does not match](https://regex101.com/r/E82iQC/1) the regex. – Wiktor Stribiżew Mar 06 '17 at 16:17
  • @WiktorStribiżew Yes it should not match that because it has more than 4 delimiters. It should match `s/.//gi/33` but should not match `s/.//gi/33/` please notice at regex that I have. It controls the number of delimiter that are between **2 up to 4** not more not less or see my previous question that I liked above in the question. Also I am sorry since my English writing is not very good. – Shakiba Moshiri Mar 06 '17 at 16:23
  • Which version of gcc are you running on windows? – trincot Mar 06 '17 at 21:02

1 Answers1

3

There seems to have been a bug in GCC that got fixed in version 5.4. My guess is you are running an older version on your Windows set-up.

See the difference in output in:

It does not seem to make a difference whether boost is included or not.

The bug is related to (?!\\1), as replacing it by (?![/]) (in both instances) solves the issue, but obviously that would limit the regular expression for use with the / delimiter only:

Also, the bug appears with this simple regular expression: (.)((?!\\1).) which should reject an input like aa:

Conclusion: make sure to install GCC version 5.4 or higher.

trincot
  • 317,000
  • 35
  • 244
  • 286
  • Very very thanks. I was upset that my program had problem. Unfortunately I used `gcc 5.3.0` on Windows and I could not update it to `6.2.0` from source. I blamed `gcc` myself but I was not sure that such a bug even exist on `gcc`. – Shakiba Moshiri Mar 07 '17 at 07:59