31

I was experimenting with regular expression in trying to make an answer to this question, and found that while regex_match finds a match, regex_search does not.

The following program was compiled with g++ 4.7.1:

#include <regex>
#include <iostream>

int main()
{
    const std::string s = "/home/toto/FILE_mysymbol_EVENT.DAT";
    std::regex rgx(".*FILE_(.+)_EVENT\\.DAT.*");
    std::smatch match;

    if (std::regex_match(s.begin(), s.end(), rgx))
        std::cout << "regex_match: match\n";
    else
        std::cout << "regex_match: no match\n";

    if (std::regex_search(s.begin(), s.end(), match, rgx))
        std::cout << "regex_search: match\n";
    else
        std::cout << "regex_search: no match\n";
}

Output:

regex_match: match
regex_search: no match

Is my assumption that both should match wrong, or might there a problem with the library in GCC 4.7.1?

Community
  • 1
  • 1
Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 9
    GCC's regex library is still largely uninmplemented (GCC Y U SHIP BROKEN ``?), so I'm inclined to say that, yes, it's a problem with the library. – R. Martinho Fernandes Jul 24 '12 at 09:37
  • 1
    Your program yields a match twice with VS2010 so I guess the problem is with the library that gcc uses. Have you tried using the boost-version of the regex-library? – MadScientist Jul 24 '12 at 10:00
  • @MadScientist VS2010 seems to be right here. Considering the test string and the regular expression used, both should be a match. For instance, both `re.match` and `re.search` from Python, give a match too. – betabandido Jul 24 '12 at 10:23
  • Why not just use the POSIX API instead? –  Jul 24 '12 at 10:54
  • @H2CO3 POSIX regex is plain, simple and good C, but might not be win32/win64 compatible (last time I tried it wasn't). – rubber boots Jul 24 '12 at 11:14
  • @rubberboots sure, that's why it's POSIX... –  Jul 24 '12 at 11:18
  • @H2CO3 Because it's not C++? :) – Some programmer dude Jul 24 '12 at 11:23
  • @JoachimPileborg Oh wait, I thought you could use C from C++...? –  Jul 24 '12 at 11:29
  • 1
    @H2CO3 You can also use assembly, but you may not want to do that :) Same thing for C functions included in C++. They break the C++ programming style (e.g., you need to start checking for return codes, instead of caring about exceptions). – betabandido Jul 24 '12 at 11:39
  • 1
    @H2CO3 Yes of course, but at the moment I'm trying to get to know the new C++11 standard and all its features. In a real-world scenario I would probably use Boost, or whatever is available in the system (such as standard POSIX functionality). – Some programmer dude Jul 24 '12 at 11:40
  • @betabandido you wanna fight? Have a look at the Objective-C runtime's messaging functions... :D –  Jul 24 '12 at 11:55
  • 1
    @H2CO3 :) I definitely would use POSIX regex (or Boost.Regex) if I needed to implement something like that in real code. But the OP's question is a good one. Actually I wonder why GCC decided to release a broken implementation of regular expressions. It is better no support at all than a broken one... – betabandido Jul 24 '12 at 12:01
  • @betabandido yes of course... Anyway Glibc is weird... –  Jul 24 '12 at 12:09
  • Yes, libstdc++'s [ is unimplemented](http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53631). The reason it's shipped is that someone contributed a partial, experimental implementation, as is common in open source projects, then as also sometimes happen in open source projects they vanished leaving it unfinished. Because some symbols are exported from `libstdc++.so` it can't be removed again without changing the ABI. If you don't like it then you could help implement it, pay someone to implement it, or stop complaining about code you got for free thanks to other people's hard work! – Jonathan Wakely Jul 24 '12 at 15:51
  • Does this answer your question? [Difference between std::regex\_match & std::regex\_search?](https://stackoverflow.com/questions/26696250/difference-between-stdregex-match-stdregex-search) – ggorlen May 04 '22 at 22:56
  • @ggorlen I would argue that it's the opposite way around, that newer question is a duplicate of my older question. :) – Some programmer dude May 05 '22 at 05:17

4 Answers4

11

Assuming that C++ and Boost Regex have a similar structure and functionality, the difference between regex_match and regex_search is explained here:

The regex_match() algorithm will only report success if the regex matches the whole input, from beginning to end. If the regex matches only a part of the input, regex_match() will return false. If you want to search through the string looking for sub-strings that the regex matches, use the regex_search() algorithm.

Yan Foto
  • 10,850
  • 6
  • 57
  • 88
7

Your regex works fine (both match, which is correct) in VS 2012rc.

In g++ 4.7.1 (-std=gnu++11), if using:

  • ".*FILE_(.+)_EVENT\\.DAT.*", regex_match matches, but regex_search doesn't.
  • ".*?FILE_(.+?)_EVENT\\.DAT.*", neither regex_match nor regex_search matches (O_o).

All variants should match but some don't (for reasons that have been pointed out already by betabandido). In g++ 4.6.3 (-std=gnu++0x), the behavior is identical to g++ 4.7.1.

Boost (1.50) matches everything correctly w/both pattern varieties.

Summary:

                        regex_match      regex_search
 -----------------------------------------------------
 g++ 4.6.3 linux            OK/-               -
 g++ 4.7.1 linux            OK/-               -
 vs 2010                     OK                OK
 vs 2012rc                   OK                OK
 boost 1.50 win              OK                OK
 boost 1.50 linux            OK                OK
 -----------------------------------------------------

Regarding your pattern, if you mean a dot character '.', then you should write so ("\\."). You can also reduce backtracking by using non-greedy modifiers (?):

".*?FILE_(.+?)_EVENT\\.DAT.*"
Community
  • 1
  • 1
rubber boots
  • 14,924
  • 5
  • 33
  • 44
4

Looking through the latest libstdc++ source code for regex_search, you will find:

* @todo Implement this function.

Unfortunately this is not the only TODO item left. GCC's <regex> implementation is currently incomplete. I recommend using Boost or Clang and #ifdef the code until GCC has caught up.

(This has neither been fixed in the 4.8 branch.)

mavam
  • 12,242
  • 10
  • 53
  • 87
2

I tried to use the regex library in C++11 and I ran into many problems (both using g++ 4.6 and 4.7). Basically, the support is either not there or there is only partial support. That is true even for the SVN version. Here you have a link describing the current status for the SVN version of libstdc++.

So, for the time being, I guess the best option is to continue using Boost.Regex.

Alternatively, you can try to use libc++. According to this document, support for regular expressions is complete.

betabandido
  • 18,946
  • 11
  • 62
  • 76