RegExp not working as expected while removing some string

Question

So I am extracting some data from a some websites and would like to remove some unnecessarily text from it.

So I did some parsers that can control the parsed content before presenting it to the users.

Here is my test code that I did.

// tried using this but it strill did not work 
function escapeRegex(string) {
return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}

var div = document.getElementById("content");
var txArray = ["If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.", "KobatoChanDaiSuki"]
txArray.forEach(x => {
  var reg = new RegExp(escapeRegex(x), "gi");
  div.innerHTML = div.innerHTML.replace(reg, "");
});

<div id="content">
Hyung : Big/older brother. Kind of an equivalent to the japanese “onii-san” but only used between male (male to male).
Translator :Pumba
TL Check : KobatoChanDaiSuki
 If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.
</div>

Se above it is not removing all the contents, Why is that ?

maybe I need to break the long string and then try to clean it, I really do not know? What do you think?

edemaine · Accepted Answer · 2021-08-06T15:06:21.403

The problem is that (, ), and . have special meanings in JavaScript regular expressions. An additional problem is that < and > are written as < and > respectively in innerHTML. innerText avoids this problem. (I figured this out by adding console.log(div.innerHTML) to look at the contents; see the snippet below.)

Try this:

var txArray = ["If you find any errors \\( broken links, non-standard content, etc\\.\\. \\), Please let us know < report chapter > so we can fix it as soon as possible\\.", "KobatoChanDaiSuki"]
txArray.forEach(x => {
  var reg = new RegExp(x, "gi");
  div.innerText = div.innerText.replace(reg, "");
});

Or you can write code to escape your regular expressions, as in the following:

var reg = new RegExp(x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), "gi");

var div = document.getElementById("content");
var txArray = ["If you find any errors \\( broken links, non-standard content, etc\\.\\. \\), Please let us know < report chapter > so we can fix it as soon as possible\\.", "KobatoChanDaiSuki"];

txArray.forEach(x => {
  var reg = new RegExp(x, "gi");
  console.log(div.innerHTML);
  div.innerText = div.innerText.replace(reg, "");
});

<div id="content">
Hyung : Big/older brother. Kind of an equivalent to the japanese “onii-san” but only used between male (male to male).
Translator :Pumba
TL Check : KobatoChanDaiSuki
 If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.
</div>

No working, I tried what you wrote and its not working. Please try to create a simple runnable code here on stackoverflow. — Alen.Toma, Aug 06 '21 at 14:58
See above I updated the code and did as you said and still not working — Alen.Toma, Aug 06 '21 at 15:02
Ah, < and > are causing trouble too. I've updated my answer. — edemaine, Aug 06 '21 at 15:06
I do not want to write `\\` manually, could you fix the `escapeRegex` function I wrote instead so i could replace those unwanted char in txArray — Alen.Toma, Aug 06 '21 at 15:10
That code should work if you switch from `innerHTML` to `innerText`. — edemaine, Aug 06 '21 at 15:11

RegExp not working as expected while removing some string

1 Answers1