1

The goal is: This json:

{"secretWord1":"private", "something": "\"secretWord2\":\"privateToo\""}

Convert to this by regex_match:

{"secretWord1":"****", "something": "\"secretWord2\":\"****\""}

I have a following code with three regex expression:

std::regex regex1(R"~((\\\"|")((?:[^\\"]*)(?:secretWord1|secretWord2))\1:\1([^\\"]*)\1)~", std::regex_constants::icase);
std::regex regex2(R"~((\\\")((?:[^\\"]*)(?:secretWord1|secretWord2))\1:\1([^\\"]*)\1)~", std::regex_constants::icase);
std::regex regex3(R"~((")((?:[^\\"]*)(?:secretWord1|secretWord2))\1:\1([^\\"]*)\1)~", std::regex_constants::icase);

std::string replaced = someJsonData;
replaced = std::regex_replace(replaced, regex1, "$1$2$1:$1****$1");
replaced = std::regex_replace(std::regex_replace(replaced, regex2, "$1$2$1:$1****$1"), regex3, "$1$2$1:$1****$1");

I want to replace secret information and hide it behind stars. The first regex fails on

error_stack: regex_error(error_stack): There was insufficient memory to determine whether the regular expression could match the specified character sequence.

Is there something wrong with the first expression? Because the other two expressions just complement each other and in the end, it does the same job like the regex1 but they work well when I run them.

I can't povide a sample code during it fails but the file isn't so big (around 30kB). And when I tried it with JSON generator the regex1 is obviously slower than when I combine regex2+regex3.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Filip Procházka
  • 187
  • 4
  • 13
  • 2
    Is there a reason why you are avoiding the use of a JSON library? https://stackoverflow.com/q/3512650/2191572 Is it because your JSON is invalid? – MonkeyZeus Aug 19 '19 at 14:44
  • Unfortunately, I can't use your recommended library. The json string is valid. – Filip Procházka Aug 19 '19 at 15:03
  • Wrong, that string is invalid JSON. You can try validating it at https://jsonlint.com/. I can only assume that none of those libraries work for you because your JSON is invalid. – MonkeyZeus Aug 19 '19 at 15:23
  • I'm sorry, you're right, my provided code was invalid. I already fixed it. But the real code which I'm testing is valid. – Filip Procházka Aug 19 '19 at 16:02
  • Thanks for being willing to reflect on my comments. Please see my answer. – MonkeyZeus Aug 19 '19 at 16:16
  • @FilipProcházka *Why* can’t you use a JSON library. If at all possible, your problem *should* be solved by a JSON library, not by a regular expression. – Konrad Rudolph Aug 20 '19 at 13:39

1 Answers1

1

I don't know much about c++ nor the memory issue but this seems to match pretty well:

(\\?"(secretword1|secretword2)\\?":\\?")(.*?)(\\?")

https://regex101.com/r/T8pY0V/2


Do note that I strongly suggest getting a JSON library but this regex could work in a pinch. It is up to you to figure out all of the edge cases where it fails.

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
  • It looks much better than what I've posted. I will try it tomorrow. I really appreciate your advice about using already finished solution but it isn't possible as I've already said above. – Filip Procházka Aug 19 '19 at 16:29
  • I'm sorry for a late answer I was a bit busy now. It works well I just edited a bit for my purposes. I'm realized that there are limitations that I'm using regex instead of a JSON library. It is used only for loging information so in worse case it will replaced something else than just a private information. Many thanks! – Filip Procházka Aug 21 '19 at 18:18
  • @FilipProcházka You're welcome. Please see my updated answer and regex101 example because I've edited it to properly ignore stuff like `"ssecretWord1":"private"`. Good luck! – MonkeyZeus Aug 21 '19 at 18:40