110

I'm working on a commercial (not open source) C++ project that runs on a linux-based system. I need to do some regex within the C++ code. (I know: I now have 2 problems.)

QUESTION: What libraries do people who regularly do regex from C/C++ recommend I look into? A quick search has brought the following to my attention:

1) Boost.Regex (I need to go read the Boost Software License, but this question is not about software licenses)

2) C (not C++) POSIX regex (#include <regex.h>, regcomp, regexec, etc.)

3) http://freshmeat.net/projects/cpp_regex/ (I know nothing about this one; seems to be GPL, therefore not usable on this project)

durron597
  • 31,968
  • 17
  • 99
  • 158
Stéphane
  • 19,459
  • 24
  • 95
  • 136
  • 21
    In case anyone is looking at this old question for hints...a new library has shown up recently that deserves to be mentioned: Google's RE2: http://code.google.com/p/re2/ – Stéphane May 25 '10 at 16:55
  • 3
    [This](https://github.com/jpcre2/jpcre2) is a c++ wrapper for the new PCRE2 (revised version of PCRE) library. – Jahid Jan 01 '16 at 15:30

10 Answers10

82

Boost.Regex is very good and is slated to become part of the C++0x standard (it's already in TR1).

Personally, I find Boost.Xpressive much nicer to work with. It is a header-only library and it has some nice features such as static regexes (regexes compiled at compile time).

Update: If you're using a C++11 compliant compiler (gcc 4.8 is NOT!), use std::regex unless you have good reason to use something else.

Mark Lakata
  • 19,989
  • 5
  • 106
  • 123
Ferruccio
  • 98,941
  • 38
  • 226
  • 299
  • 1
    A project I was personally involved with had to switch from Boost.Regex to PCRE because of binary compatibility issues (Boost's non-header-only libraries tend to suffer from inexplicable ABI breakage with minor releases and/or compiler option changes). However, if it's been absorbed into the C++ standard library that should cease to be a problem. – zwol Jan 09 '11 at 18:35
  • 5
    Boost is full of bugs and appears to lack a coding standards and QA process. Its not really suitable for production software. That includes its Regex gear, which uses C (rather than C++) in places and includes buffer overflows due to unsafe functions such as sprintf. When I reported a bunch of bugs after an audit, they remained "unacknowledged" months after the report. Use at your own risk. – jww Oct 25 '12 at 05:37
  • 1
    `std::regex` isn't available in `libstdc++` yet so I fall back to posix regex. – Matt Clarkson Feb 04 '13 at 13:29
  • 8
    Almost 5 years later, I tried today to use std::regex, but it turns out it hasn't yet been implemented in GCC. See http://stackoverflow.com/questions/15671536/why-does-this-c11-stdregex-example-throw-a-regex-error-exception – Stéphane Mar 28 '13 at 00:22
  • 2
    that good reason not to use std::regex or boost::regex for that matter would be that boost::regex, is around 10 times slower than re2 – Arsen Zahray Sep 30 '13 at 09:37
  • std::regex is still not complete - http://stackoverflow.com/a/12665408/452090 – hB0 Oct 29 '13 at 08:22
  • 3
    @jww Your comments are silly. sprintf does not necessarily demand a buffer overflow; it's entirely possible to use it and not have one occur, unlike (for example) gets. As well, boost is not "full of bugs"; while the occasional one crops up (as shown in it's changelog), it is no more than any other large scale project. As for a lack of coding standards or QA process, that's categorically wrong, as a simple google search would show. Given boost has been absorbed into the standard itself, I would suggest your comments are not merely untrue, but categorically so. – Alice Aug 10 '14 at 00:18
  • 1
    @Alice - I'm not a FanBoi, so I don't share you enthusiasm. I actually performed the audit, and the overflow was there plain as day. Two of them if I recall correctly. If Boost had a mature engineering process, then it likely would have been caught during review at checkin. – jww Aug 10 '14 at 00:24
  • *"Given boost has been absorbed into the standard itself..."* - I'm not sure what that means. But if it means Boost's implementation is a reference implementation, then it probably means its unaudited (and probably broken somewhere). – jww Aug 10 '14 at 00:27
  • 4
    @jww No, the C++ standard (C++03 TR, C++11 and C++1y) has decided to [incorporate several boost libraries into the standard](http://www.open-std.org/jtc1/sc22/wg21/docs/library_technical_report.html). That means, for all practical purposes, Boost *made* the standard. Making assertions without evidence using weasel words like "probably" and using personal attacks do nothing to change the fact that large parts of boost are now C++, and many of the people defining the modern direction of C++ are also working on boost. – Alice Aug 10 '14 at 00:59
  • 3
    @Alice - The C and C++ committees create standards. They don't incorporate libraries. I'm not aware of them ever producing a library. – jww Aug 10 '14 at 01:04
  • 3
    @jww [Except they did](http://www.open-std.org/jtc1/sc22/wg21/docs/library_technical_report.html) – Alice Aug 10 '14 at 01:05
  • @Alice - Please show me a C or C++ library produced by the committee with Boost code in it. It does not have to uptake all Boost code; just some Boost code will be fine. – jww Aug 10 '14 at 01:07
  • 3
    @jww [I already have three times now](http://www.open-std.org/jtc1/sc22/wg21/docs/library_technical_report.html) – Alice Aug 10 '14 at 01:08
24

Thanks for all the suggestions.

I tried out a few things today, and with the stuff we're trying to do, I opted for the simplest solution where I don't have to download any other 3rd-party library. In the end, I #include <regex.h> and used the standard C POSIX calls regcomp() and regexec(). Not C++, but in a pinch this proved to be the easiest.

Stéphane
  • 19,459
  • 24
  • 95
  • 136
20

In C++ projects past, I have used PCRE with good success. It's very complete and well-tested since it's used in many high profile projects. And I see that Google has contributed a set of C++ wrappers for PCRE recently, too.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
16

C++ has a builtin regex library since TR1. AFAIK Boost's regex library is very compatible with it and can be used as a replacement, if your standard library doesn't provide TR1.

Kasprzol
  • 4,087
  • 22
  • 20
  • What compiler has TR1? My copy of g++ 4.1.2 (Debian Etch) does not have support for #include but thanks for bringing TR1 to my attention, I had forgotten. For others curious to know more on TR1 and C++0x, see http://en.wikipedia.org/wiki/Technical_Report_1 – Stéphane Oct 08 '08 at 07:36
  • As of SP1 Visual Studio 2008 has most of TR1, including regex. I know it doesn't help you on Linux, but others may be interested. Dinkumware also supports TR1 on gcc. – Michael Burr Oct 08 '08 at 08:18
  • As I wrote, if your std library doesn't have regex, then you can use boost: http://www.boost.org/doc/libs/1_36_0/doc/html/boost_tr1/subject_list.html#boost_tr1.subject_list.regex – Kasprzol Oct 08 '08 at 08:27
  • 3
    g++ 4.5.0. TR1 lives in tr1/regex. e.g.: #include – Ogre Psalm33 Feb 07 '11 at 22:24
11

Boost has regex in it.

That should fill the bill

Robert Gould
  • 68,773
  • 61
  • 187
  • 272
  • Also appears to be slower than Googles Re2 http://lh3lh3.users.sourceforge.net/reb.shtml – Chad Oct 25 '12 at 14:21
11

Two more options:

If you can write it in c++11 - Do the tutorial: http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339

Note: At the time of writing the only c++11 regex library that I know works is the clang/llvm one, and only works on Mac. The GNU still doesn't implement regex yet. I don't know about Visual Studio. Most people still use the boost regex implementation.


Or you can use ragel to generate a finite state machine to do the parsing for you, and generate the C/C++ code implementation: http://www.complang.org/ragel/

I used it a little to generate code to parse json. This ragel file: https://github.com/matiu2/yajp/blob/master/parser/number.rl is used to generate this code https://github.com/matiu2/yajp/blob/master/parser/json.hpp#L254 and this finite state machine diagram:

state diagram


Update 1:

lvm's libc++ regex works on ubuntu 14.04: libc++-dev - LLVM C++ Standard library (development files). When compiling: clang++ -std=c++11 -lc++ -I/usr/include/c++/v1 ...

Update 2:

I'm currently enjoying boost spirit 3 - I like it more than regex, because it has BNF style rules and is well thought out. (Older (more documented) Spirit Qi libs found here)

matiu
  • 7,469
  • 4
  • 44
  • 48
7

You can also look at fast regex library that was developed at Yandex search engine for doing fast matches of thousands of patterns against huge amounts of data.

6

I've personally always used boost.regex (although I don't have much need for regex in C++). Microsoft Labs has a regex library too, called GRETA: http://research.microsoft.com/projects/greta/. Apparently it's very fast and features a whole Perl 5 syntax. I haven't used it, but you may want to test it out.

Roel
  • 19,338
  • 6
  • 61
  • 90
  • 8
    GRETA (http://research.microsoft.com/en-us/downloads/bd99f343-4ff4-4041-8293-34c054efe749/default.aspx) was made by Eric Niebler when he worked at Microsoft (1998-2001 from GRETA's header files). Eric Niebler then made in 2007 Boost.Xpressive. People should use Boost.Xpressive because it's newer and has a nicer license than "Microsoft Research end user license agreement" – Cristian Adam Sep 08 '09 at 15:14
  • 1
    Sorry, I dont see how pulling in the Boost library is a good thing. The last time I checked the local download uncompressed version of boost is 400 megs. Not to mention the inain template madness you get with boost. Sorry, I recommend Gregs answer. – Chad Oct 25 '12 at 11:54
  • http://lh3lh3.users.sourceforge.net/reb.shtml – Chad Oct 25 '12 at 14:21
  • 1
    @Chad Because boost is a well known and well regarded set of standard libraries that are helpful in many situations? If the download size is too big for you, just use BCD to strip anything you don't need; boost.regex is quite small when stripped in this manner. – Alice Aug 10 '14 at 00:19
4

I faced a similar situation and ended up using Henry Spencers Regexp Engine http://www.codeproject.com/KB/string/spencerregexp.aspx

MartinKahn
  • 137
  • 2
  • 10
1

Noone here said anything about the one that comes with C++0x. If you are using a compiler and the STL that supports C++0x you could just use that instead of having another lib in your project.

RedX
  • 14,749
  • 1
  • 53
  • 76