20

I am trying to replace certain patterns in a string with different replacement patters.

Example:

string test = "test replacing \"these characters\"";

What I want to do is replace all ' ' with '_' and all other non letter or number characters with an empty string. I have the following regex created and it seems to tokenize correctly, but I am not sure how to (if possible) perform a conditional replace using regex_replace.

string test = "test replacing \"these characters\"";
regex reg("(\\s+)|(\\W+)");

expected result after replace would be:

string result = "test_replacing_these_characters";

EDIT: I cannot use boost, which is why I left it out of the tags. So please no answer that includes boost. I have to do this with the standard library. It may be that a different regex would accomplish the goal or that I am just stuck doing two passes.

EDIT2: I did not remember what characters were included in \w at the time of my original regex, after looking it up I have further simplified the expression. Again the goal is anything matching \s+ should be replaced with '_' and anything matching \W+ should be replaced with empty string.

ildjarn
  • 62,044
  • 9
  • 127
  • 211
pstrjds
  • 16,840
  • 6
  • 52
  • 61
  • Why did you drop the last `"`-char in your example output? – rubber boots Jul 16 '12 at 17:20
  • @rubberboots - because only white space should be replaced with an underscore, any other non letter and digit character should be replaced with nothing. – pstrjds Jul 16 '12 at 17:24
  • I see, so you'll want to have different replacement texts in one pass. This won't work afaik in c++ regex. If somebody nows a trick for this, I'd like to use that too ;-) – rubber boots Jul 16 '12 at 17:52
  • @rubberboots - Yep, that is the reason for my question and I figure I will end up finding out - "You can't do that", but I figured I could ask and hope that somebody smart would have a solution. – pstrjds Jul 16 '12 at 17:55
  • I found a method with callback functions that, unfortunately, isn't actually working in my C++11 implementation (g++ 4.6.1, VS2012) (but works in boost). – rubber boots Jul 16 '12 at 19:18

2 Answers2

28

The c++ (0x, 11, tr1) regular expressions do not really work (stackoverflow) in every case (look up the phrase regex on this page for gcc), so it is better to use boost for a while.

You may try if your compiler supports the regular expressions needed:

#include <string>
#include <iostream>
#include <regex>

using namespace std;

int main(int argc, char * argv[]) {
    string test = "test replacing \"these characters\"";
    regex reg("[^\\w]+");
    test = regex_replace(test, reg, "_");
    cout << test << endl;
}

The above works in Visual Studio 2012Rc.

Edit 1: To replace by two different strings in one pass (depending on the match), I'd think this won't work here. In Perl, this could easily be done within evaluated replacement expressions (/e switch).

Therefore, you'll need two passes, as you already suspected:

 ...
 string test = "test replacing \"these characters\"";
 test = regex_replace(test, regex("\\s+"), "_");
 test = regex_replace(test, regex("\\W+"), "");
 ...

Edit 2:

If it would be possible to use a callback function tr() in regex_replace, then you could modify the substitution there, like:

 string output = regex_replace(test, regex("\\s+|\\W+"), tr);

with tr() doing the replacement work:

 string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }

the problem would have been solved. Unfortunately, there's no such overload in some C++11 regex implementations, but Boost has one. The following would work with boost and use one pass:

...
#include <boost/regex.hpp>
using namespace boost;
...
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
...

string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+|\\W+"), tr);   // <= works in Boost
...

Maybe some day this will work with C++11 or whatever number comes next.

Regards

rbo

Community
  • 1
  • 1
rubber boots
  • 14,924
  • 5
  • 33
  • 44
  • I don't want to replace " with underscore, it should be replaced with nothing. That is the crux of my issue, I want to replace the first match group with _ and the second match group with empty string. I should also have mentioned that I cannot use boost. – pstrjds Jul 16 '12 at 17:27
  • Your second edit that runs in VS2012 still does not solve my issue. The whitespace must be replaced with _ and all other non letter and number characters must be replaced with empty string – pstrjds Jul 16 '12 at 17:36
  • The two-pass version does this on my system, the result is `test_replacing_these_characters`. – rubber boots Jul 16 '12 at 18:04
-1

The way to do this has commonly been accomplished by using four backslashes to remove the backlash effecting the actual C code. Then you will need to make a second pass for the parentheses and escape them in your regex then and only then.

string tet = "test replacing \"these characters\"";
//regex reg("[^\\w]+");
regex reg("\\\\"); //--AS COMMONLY TAUGHT AND EXPLAINED
tet = regex_replace(tet, reg, " ");
cout << tet << endl;

regex reg2("\""); //--AS SHOWN
tet = regex_replace(tet, reg2, " "); 
cout << tet << endl;

And in a single pass use;

string tet = "test replacing \"these characters\"";
//regex reg("[^\\w]+");
regex reg3("\\\""); //--AS EXPLAINED
tet = regex_replace(tet, reg3, "");
cout << tet << endl;
  • This does not answer the question. The question was whether there was a way (back in 2012) for me to replace all space characters with underscores and all non letter and number characters with the empty string. I was hoping to do it in a single pass. Your answer also does not output the correct result which is: test_replacing_these_characters – pstrjds May 10 '21 at 22:16