0

So I am extracting some data from a some websites and would like to remove some unnecessarily text from it.

So I did some parsers that can control the parsed content before presenting it to the users.

Here is my test code that I did.

// tried using this but it strill did not work 
function escapeRegex(string) {
return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}

var div = document.getElementById("content");
var txArray = ["If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.", "KobatoChanDaiSuki"]
txArray.forEach(x => {
  var reg = new RegExp(escapeRegex(x), "gi");
  div.innerHTML = div.innerHTML.replace(reg, "");
});
<div id="content">
Hyung : Big/older brother. Kind of an equivalent to the japanese “onii-san” but only used between male (male to male).
Translator :Pumba
TL Check : KobatoChanDaiSuki
 If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.
</div>

Se above it is not removing all the contents, Why is that ?

maybe I need to break the long string and then try to clean it, I really do not know? What do you think?

Alen.Toma
  • 4,684
  • 2
  • 14
  • 31

1 Answers1

1

The problem is that (, ), and . have special meanings in JavaScript regular expressions. An additional problem is that < and > are written as &lt; and &gt; respectively in innerHTML. innerText avoids this problem. (I figured this out by adding console.log(div.innerHTML) to look at the contents; see the snippet below.)

Try this:

var txArray = ["If you find any errors \\( broken links, non-standard content, etc\\.\\. \\), Please let us know < report chapter > so we can fix it as soon as possible\\.", "KobatoChanDaiSuki"]
txArray.forEach(x => {
  var reg = new RegExp(x, "gi");
  div.innerText = div.innerText.replace(reg, "");
});

Or you can write code to escape your regular expressions, as in the following:

var reg = new RegExp(x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), "gi");

var div = document.getElementById("content");
var txArray = ["If you find any errors \\( broken links, non-standard content, etc\\.\\. \\), Please let us know < report chapter > so we can fix it as soon as possible\\.", "KobatoChanDaiSuki"];

txArray.forEach(x => {
  var reg = new RegExp(x, "gi");
  console.log(div.innerHTML);
  div.innerText = div.innerText.replace(reg, "");
});
<div id="content">
Hyung : Big/older brother. Kind of an equivalent to the japanese “onii-san” but only used between male (male to male).
Translator :Pumba
TL Check : KobatoChanDaiSuki
 If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.
</div>
edemaine
  • 2,699
  • 11
  • 20