25

Is there a function of regex replacement that will send the matches to user function and then substitute the return value:

I've tried this method, but it obviously doesn't work:

cout << regex_replace("my values are 9, 19", regex("\d+"), my_callback);

and function:

std::string my_callback(std::string &m) {
  int int_m = atoi(m.c_str());
  return std::to_string(int_m + 1);
}

and the result should be: my values are 10, 20

I mean similar mode of working like php's preg_replace_callback or python's re.sub(pattern, callback, subject)

And I mean the latest 4.9 gcc, that is capable of regex without boost.

rsk82
  • 28,217
  • 50
  • 150
  • 240

4 Answers4

27

I wanted this kind of function and didn't like the answer "use boost". The problem with Benjamin's answer is it provides all the tokens. This means you don't know which token is a match and it doesn't let you use capture groups. This does:

// clang++ -std=c++11 -stdlib=libc++ -o test test.cpp
#include <cstdlib>
#include <iostream>
#include <string>
#include <regex>

namespace std
{

template<class BidirIt, class Traits, class CharT, class UnaryFunction>
std::basic_string<CharT> regex_replace(BidirIt first, BidirIt last,
    const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
    std::basic_string<CharT> s;

    typename std::match_results<BidirIt>::difference_type
        positionOfLastMatch = 0;
    auto endOfLastMatch = first;

    auto callback = [&](const std::match_results<BidirIt>& match)
    {
        auto positionOfThisMatch = match.position(0);
        auto diff = positionOfThisMatch - positionOfLastMatch;

        auto startOfThisMatch = endOfLastMatch;
        std::advance(startOfThisMatch, diff);

        s.append(endOfLastMatch, startOfThisMatch);
        s.append(f(match));

        auto lengthOfMatch = match.length(0);

        positionOfLastMatch = positionOfThisMatch + lengthOfMatch;

        endOfLastMatch = startOfThisMatch;
        std::advance(endOfLastMatch, lengthOfMatch);
    };

    std::regex_iterator<BidirIt> begin(first, last, re), end;
    std::for_each(begin, end, callback);

    s.append(endOfLastMatch, last);

    return s;
}

template<class Traits, class CharT, class UnaryFunction>
std::string regex_replace(const std::string& s,
    const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
    return regex_replace(s.cbegin(), s.cend(), re, f);
}

} // namespace std

using namespace std;

std::string my_callback(const std::smatch& m) {
  int int_m = atoi(m.str(0).c_str());
  return std::to_string(int_m + 1);
}

int main(int argc, char *argv[])
{
    cout << regex_replace("my values are 9, 19", regex("\\d+"),
        my_callback) << endl;

    cout << regex_replace("my values are 9, 19", regex("\\d+"),
        [](const std::smatch& m){
            int int_m = atoi(m.str(0).c_str());
            return std::to_string(int_m + 1);
        }
    ) << endl;

    return 0;
}
Violet Giraffe
  • 32,368
  • 48
  • 194
  • 335
John Martin
  • 386
  • 3
  • 2
  • 2
    +1 for the solution, but you need to use a namespace other than `std`. Currently your example has [undefined behavior](https://timsong-cpp.github.io/cppwp/n3337/namespace.std), since you're _overloading_ `std::regex_replace`, not _specializing_ it. – andreee May 06 '19 at 08:17
  • 1
    @andree is correct - this solution is helpful assuming you never plan to use std::regex_replace with string replacements; otherwise, the compiler will throw an error due to ambiguity. – Evan Hendler Apr 14 '20 at 17:12
13

You could use a regex_token_iterator

#include <iostream>
#include <algorithm>
#include <regex>
#include <string>
#include <sstream>

int main()
{
    std::string input_text = "my values are 9, 19";
    std::string output_text;
    auto callback = [&](std::string const& m){
        std::istringstream iss(m);
        int n;
        if(iss >> n)
        {
            output_text += std::to_string(n+1);
        }
        else
        {
            output_text += m;
        }
    };

    std::regex re("\\d+");
    std::sregex_token_iterator
        begin(input_text.begin(), input_text.end(), re, {-1,0}),
        end;
    std::for_each(begin,end,callback);

    std::cout << output_text;
}

Note that the {-1,0} in the argument list of the iterator constructor is a list specifying the submatches we want to iterate over. The -1 is for non-matching sections, and the 0 is for the first submatch.

Also note that I have not used the c++11 regex functionality extensively and am no expert in it. So there may be problems with this code. But for your specific input, I tested it and it seems to produce the expected results. If you find any input set for which it doesn't work, please let me know.

Benjamin Lindley
  • 101,917
  • 9
  • 204
  • 274
  • 1
    It works. But I think Boost is a better solution. http://stackoverflow.com/questions/11508798/conditionally-replace-regex-matches-in-string – duleshi Sep 18 '15 at 09:15
3

Maybe I arrived too late to this party (about 5 years thought), but I neither liked the answer "use boost", following function has less generalization (speaking about string types), but apparently works. However, I don't know if use a std::ostringstream is better than std::string::append:

std::string regex_replace(
    const std::string& input,
    const std::regex& regex, 
    std::function<std::string(std::smatch const& match)> format) {

    std::ostringstream output;
    std::sregex_iterator begin(input.begin(), input.end(), regex), end;
    for(; begin != end; begin++){
        output << begin->prefix() << format(*begin);
    }
    output << input.substr(input.size() - begin->position());
    return output.str();
}

So, as you can see I used std::sregex_iterator instead of std::sregex_token_iterator.

zatarain
  • 61
  • 4
  • I'm not too up to date with C++ standards but is this `std::function` a function def that can be used elsewhere, like whatever the format points to ? –  Dec 07 '19 at 17:03
  • 1
    I been writing c++ (old standard) for a long time and I can't stand to see block braces on the same line as code... just me! –  Dec 07 '19 at 17:05
  • 1
    Hi zatarain, a nice idea, but it hasn't worked out of the box. It seems to make problems, if the match is at the beginning of the string. I've tried to add a separate index integer, which is updated inside the for loop as the last statement: `subStrStartIndex = begin->position() + begin->length()`. Than you can use it for the output stream in the for loop: `output << input.substr(subStrStartIndex, begin->position() - subStrStartIndex) << format(*begin);` And you avoid to derefence `begin` outside of the for loop, which shouldn't be valid anymore: `output << input.substr(subStrStartIndex);` – Flow Rei Ser Jan 09 '20 at 07:20
  • 1
    this doesn't work. You can't call `begin->position()` when `begin == end` – phuclv Oct 14 '20 at 03:58
-1

That kind of functionality only exists in the Boost library version of regex_replace, which can have the custom formatter. Unfortunately, the standard C++11 implementation requires the replacement format argument must be a string.

Here is the documentation on regex_replace: http://www.cplusplus.com/reference/regex/match_replace/

blockchaindev
  • 3,134
  • 3
  • 22
  • 30