-2

I am trying to build HTML parser that replaces for example alert("hi there123"); with "".
So

alert(123); var hello = 0;
console.log(hello);

Will become:

var hello = 0;
console.log(hello);

Someone suggested using regex, so I dove into that. Now this is what I found online:
something(.*?)something will grab the text between the words "something" and "something", but when I try to do with with the word "alert(" <-- notice the ( , it doesn't work.

P.S. What I am trying to do, is first remove the text between the alert() and then remove alert(); like so: string.replace("alert();", "");

And does regex loop over every occurrence?

If anyone has a better solution, be sure to inform me.

Any help would be appreciated.

chris85
  • 23,846
  • 7
  • 34
  • 51
Gerrit Luimstra
  • 522
  • 6
  • 23
  • 2
    Parentheses in a regular expression have special meaning (they mark groups). If you don't want the special meaning and instead want to match the parenthesis, you need to escape it. Regular expressions, like good knives, are very sharp tools that you can hurt yourself with. Be careful and learn all about them before misusing them. – Ross Presser Aug 04 '16 at 13:44
  • @RossPresser So you advice against using regex for such tasks? If so, what would be a better approach? Thank you for the explanation. – Gerrit Luimstra Aug 04 '16 at 13:45
  • Regular Expressions should ***not*** be used to parse HTML. See this question: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – War10ck Aug 04 '16 at 13:47
  • 1
    I think Ross's point is to not just uses regular expressions you find. You should look at the parts and understand what each part does. If you use something incorrectly it might have great consequences. A good place to start is regex101.com, that will give an explanation for each part of a regex (be sure to select the correct language on the left panel). – chris85 Aug 04 '16 at 13:51
  • Any parser is pointless to protect against js code (though I'm sure you could get close). I could easily do `var s='ale'+'r'; eval(s+'t("fail")');` - so, are you going to parse all possible combinations? You'd have to parse out `eval` also, but then there's `Function`... ;) Also, what about `alert.call(null, 'failed again')` ;) – James Wilkins Aug 05 '16 at 23:30
  • Oh, then there's also `var a=alert; a('failed yet again'); a.call(null, '...and again')`... ;) – James Wilkins Aug 05 '16 at 23:37

3 Answers3

1

You should have a regex like this:

/(alert\((.*)\);)/g

Escaping the parenthesis would be important here.

. match every characters.

* match n occurences.

\ escape the following character

g modifier: Global modifier, does not return only the first match.

Sample here: https://regex101.com/r/fD8tQ8/1

Community
  • 1
  • 1
Didier Aupest
  • 3,227
  • 2
  • 23
  • 35
0

You need to escape the ( and ):

alert\(.*?\);

Demo: https://regex101.com/r/sL6fM6/1

Note also, you can do it in one go.

Important note: this won't work if you have parentheses in the expression within alert(...). This should be dealt with by more tricky regex.

Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
0

You have to escape some characters to achieve this.

Escape the parenthesis for the regex.
And escape the double-quotes for the .replace() method.

And finally, to test your regexs, here is a nice regex tester : https://regex101.com/#javascript

Something like this should work for you :

string.replace(/alert\([\'|\"](.*)[\'|\"]\)\;/gmi, "");

Tested in this CodePen.

Louys Patrice Bessette
  • 33,375
  • 6
  • 36
  • 64