32

I would like to extract a substring between two others.
ex: /home/toto/FILE_mysymbol_EVENT.DAT
or just FILE_othersymbol_EVENT.DAT
And I would like to get : mysymbol and othersymbol

I don't want to use boost or other libs. Just standard stuffs from C++, except CERN's ROOT lib, with TRegexp, but I don't know how to use it...

Christian Rau
  • 45,360
  • 10
  • 108
  • 185
eouti
  • 5,338
  • 3
  • 34
  • 42

5 Answers5

59

Since last year C++ has regular expression built into the standard. This program will show how to use them to extract the string you are after:

#include <regex>
#include <iostream>

int main()
{
    const std::string s = "/home/toto/FILE_mysymbol_EVENT.DAT";
    std::regex rgx(".*FILE_(\\w+)_EVENT\\.DAT.*");
    std::smatch match;

    if (std::regex_search(s.begin(), s.end(), match, rgx))
        std::cout << "match: " << match[1] << '\n';
}

It will output:

match: mysymbol

It should be noted though, that it will not work in GCC as its library support for regular expression is not very good. Works well in VS2010 (and probably VS2012), and should work in clang.


By now (late 2016) all modern C++ compilers and their standard libraries are fully up to date with the C++11 standard, and most if not all of C++14 as well. GCC 6 and the upcoming Clang 4 support most of the coming C++17 standard as well.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • Works well for simple text, but fails on text with with embedded `\n` – FractalSpace May 07 '21 at 13:20
  • @FractalSpace The chances of having a filesystem path with a newline in it is rather slim, so I can live with that problem. ;) – Some programmer dude May 07 '21 at 14:17
  • The question however didn't specifically mention that the string is a filesystem path. – FractalSpace May 07 '21 at 14:45
  • 1
    Good, but this wont works at VS 2019. I solved this in VS 2019 with following code:```std::regex rgx(".*\"token\":\"([^\"]+)\".*"); std::smatch match; if (std::regex_search(res, match, rgx)) { std::cout << "match: " << match[1] << '\n'; } else { }``` – Max Base May 26 '21 at 13:43
  • Error of your code at VS 2019: `'bool std::regex_search(const std::basic_string<_Elem,_StTraits,_StAlloc> &,const std::basic_regex<_Elem,_RxTraits> &,std::regex_constants::match_flag_type)': expects 3 arguments - 4 provided` – Max Base May 26 '21 at 13:44
  • @MaxBase That's weird, because nothing have changed with [`std::regex_search`](https://en.cppreference.com/w/cpp/regex/regex_search) since the C++11 specification. The first overload in the linked reference is the one I'm using in my answer (with two iterators for the string). The one you're using os either overload 2 or 3, depending on what `res` is. – Some programmer dude May 26 '21 at 14:47
  • @MaxBase Can you please try to make a proper [mcve] of your failing program, and post a question about it? Then we could better help you solve the problem. – Some programmer dude May 26 '21 at 14:49
4

TRegexp only supports a very limited subset of regular expressions compared to other regex flavors. This makes constructing a single regex that suits your needs somewhat awkward.

One possible solution:

[^_]*_([^_]*)_

will match the string until the first underscore, then capture all characters until the next underscore. The relevant result of the match is then found in group number 1.

But in your case, why use a regex at all? Just find the first and second occurrence of your delimiter _ in the string and extract the characters between those positions.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
4

If you want to use regular expressions, I'd really recommend using C++11's regexes or, if you have a compiler that doesn't yet support them, Boost. Boost is something I consider almost-part-of-standard-C++.

But for this particular question, you do not really need any form of regular expressions. Something like this sketch should work just fine, after you add all appropriate error checks (beg != npos, end != npos etc.), test code, and remove my typos:

std::string between(std::string const &in,
                    std::string const &before, std::string const &after) {
  size_type beg = in.find(before);
  beg += before.size();
  size_type end = in.find(after, beg);
  return in.substr(beg, end-beg);
}

Obviously, you could change the std::string to a template parameter and it should work just fine with std::wstring or more seldomly used instantiations of std::basic_string as well.

Christopher Creutzig
  • 8,656
  • 35
  • 45
  • And the reason for not using the C++ standard library's regex functionality (which is not just almost-part-of-standard-C++ but actually completely-part-of-standard-C++) instead of boost's is...? But you're right in that regexs may be overkill anyway, but then again, who knows what strange conditions he wants to search the substring by (though he didn't state anything). – Christian Rau Jul 24 '12 at 09:31
  • Not everyone is lucky enough to be in a C++11 friendly environment. (I happen not to be, though I'd love to, which is why I tend not to remember all the new good stuff we're supposed to have at our fingertips.) Boost works just fine for a lot of older compilers, too. – Christopher Creutzig Jul 25 '12 at 08:09
  • A simple mentioning of the potential non-neccessity of a boost-dependence would have been enough. – Christian Rau Jul 25 '12 at 08:12
  • Well, you did that just fine. :-) As I said, I just forgot C++11 has this. But I'll edit my answer accordingly. – Christopher Creutzig Jul 25 '12 at 08:14
0

I would study corner cases before trusting it.

But This is a good candidate:

std::string text = "/home/toto/FILE_mysymbol_EVENT.DAT";
std::regex reg("(.*)(FILE_)(.*)(_EVENT.DAT)(.*)");
std::cout << std::regex_replace(text, reg, "$3") << '\n';
Max Base
  • 639
  • 1
  • 7
  • 15
oguz
  • 81
  • 1
  • 4
0

The answers of Some programmer dude, Tim Pietzcker, and Christopher Creutzig are cool and correct, but they seemed to me not very obvious for beginners.

The following function is an attempt to create an auxiliary illustration for Some programmer dude and Tim Pietzcker's answers:

void ExtractSubString(const std::string& start_string
    , const std::string& string_regex_extract_substring_template)
{
    std::regex regex_extract_substring_template(
        string_regex_extract_substring_template);

    std::smatch match;

    std::cout << std::endl;

    std::cout << "A substring extract template: " << std::endl;
    std::cout << std::quoted(string_regex_extract_substring_template) 
        << std::endl;

    std::cout << std::endl;

    std::cout << "Start string: " << std::endl;
    std::cout << start_string << std::endl;

    std::cout << std::endl;

    if (std::regex_search(start_string.begin(), start_string.end()
       , match, regex_extract_substring_template))
    {
        std::cout << "match0: " << match[0] << std::endl;
        std::cout << "match1: " << match[1] << std::endl;
        std::cout << "match2: " << match[2] << std::endl;
    }

    std::cout << std::endl;
}

The following overloaded function is an attempt to help illustrate Christopher Creutzig's answer:

void ExtractSubString(const std::string& start_string
    , const std::string& before_substring, const std::string& after_substring)
{
    std::cout << std::endl;

    std::cout << "A before substring: " << std::endl;
    std::cout << std::quoted(before_substring) << std::endl;

    std::cout << std::endl;

    std::cout << "An after substring: " << std::endl;
    std::cout << std::quoted(after_substring) << std::endl;

    std::cout << std::endl;

    std::cout << "Start string: " << std::endl;
    std::cout << start_string << std::endl;

    std::cout << std::endl;

    size_t before_substring_begin 
        = start_string.find(before_substring);
    size_t extract_substring_begin 
        = before_substring_begin + before_substring.size();
    size_t extract_substring_end 
        = start_string.find(after_substring, extract_substring_begin);

    std::cout << "Extract substring: " << std::endl;
    std::cout
    << start_string.substr(extract_substring_begin
       , extract_substring_end - extract_substring_begin)
    << std::endl;

    std::cout << std::endl;
}

This is the main function to run the overloaded functions:

#include <regex>
#include <iostream>
#include <iomanip>

int main()
{
    const std::string start_string 
        = "/home/toto/FILE_mysymbol_EVENT.DAT";

    const std::string string_regex_extract_substring_template(
        ".*FILE_(\\w+)_EVENT\\.DAT.*");
    const std::string string_regex_extract_substring_template2(
        "[^_]*_([^_]*)_");

    ExtractSubString(start_string, string_regex_extract_substring_template);

    ExtractSubString(start_string, string_regex_extract_substring_template2);

    const std::string before_substring = "/home/toto/FILE_";
    const std::string after_substring = "_EVENT.DAT";

    ExtractSubString(start_string, before_substring, after_substring);
}

This is the result of executing the main function:

A substring extract template: 
".*FILE_(\\w+)_EVENT\\.DAT.*"

Start string: 
"/home/toto/FILE_mysymbol_EVENT.DAT"

match0: /home/toto/FILE_mysymbol_EVENT.DAT
match1: mysymbol
match2: 


A substring extract template: 
"[^_]*_([^_]*)_"

Start string: 
"/home/toto/FILE_mysymbol_EVENT.DAT"

match0: /home/toto/FILE_mysymbol_
match1: mysymbol
match2: 


A before substring: 
"/home/toto/FILE_"

An after substring: 
"_EVENT.DAT"

Start string: 
"/home/toto/FILE_mysymbol_EVENT.DAT"

Extract substring: 
mysymbol
Vitalii
  • 53
  • 1
  • 7