0

The string:

<a href="javascript:void(0)"; onclick="window.location="mailto:"+this.innerHTML.split("").reverse().join("");" style="direction:rtl;unicode-bidi:bidi-override;">link</a>

The goal: match all quotes (6) from the onclick attribute:

window.location="mailto:"+this.innerHTML.split("").reverse().join("");

Thanks!

Ain Tohvri
  • 2,987
  • 6
  • 32
  • 51

2 Answers2

2

If you're just looking for a regex that does this, then won't /"/g just work fine? So, in JavaScript...

var str = 'window.location="mailto:"+this.innerHTML.split("").reverse().join("");';
// you can get this string from anywhere.

str.match(/"/g);
// returns an array of quotes whose length is equal to the # of quote characters found.

That will be less than useful but you have not said what you want to do with it.

Or, if you want to get everything that is in quotes, use the regex /"(.*?)"/g.

Explanation:

  • Matches a quote character
  • Does a lazy match for any character
  • Stops at the next quote character.

There's a capture group already provided to do useful things with.

rvighne
  • 20,755
  • 11
  • 51
  • 73
  • Thanks for the answer. To resort to using Web API to match an attribute, would indeed work, but the original question was more meant as fully RegEx-based. – Ain Tohvri Feb 12 '14 at 00:45
  • 1
    @AinTohvri What is "Web API"? I assume you mean the DOM. That's OK, it'll actually work on any string. I used the DOM because you talked about attributes. – rvighne Feb 12 '14 at 00:47
  • As said, the scope of the question was to resort entirely to RegEx, instead of `getAttribute()` call I'm simply using `/onclick="(.*)"\s/g`, but the question is, if it's possible to do it one go without chaining regex or resorting to HTML Element attribute call for the preliminary match… – Ain Tohvri Feb 12 '14 at 00:53
  • @AinTohvri Oh! I understand, so you want the contents of the `onclick` attribute. It's really bad practice to go about that with regex if you can avoid it. If you can't then the regex you have not is appropriate but I would recommend removing the `\s` at the end. – rvighne Feb 12 '14 at 00:58
0

Assuming you're trying to fix errors because somebody forgot to escape the quotes in their JS, try this regex:

(onclick=")(.*?)(?<!\\)"(.*?)(?<!\\)"

and replace with

$1$2\\"$3\\"

This will replace the next set of double quotes that aren't escaped, and just repeat as necessary.

Since this is invalid HTML and that you're trying to parse it with Regular Expressions, this is probably the best you'll be able to do to replace said text.

tommoyang
  • 329
  • 1
  • 6
  • Care to explain that extremely long regex? Reading through it is difficult so you could provide some insight as to what it does. – rvighne Feb 12 '14 at 01:10
  • 1
    I actually screwed it up, i'll rewrite it with an explanation asap – tommoyang Feb 12 '14 at 01:12
  • I'm stumped. So far i'm at >>> onclick="(.*?)(?<!\\)"(.*?)(?<!\\)".*?(.*?)"([^"]*) <<< but I can't figure out how to not match the following attribute – tommoyang Feb 12 '14 at 01:27
  • Just saying, you're trying to parse invalid HTML. (not your fault really but you did post an answer). Just stop crazily generating regexes and think for a minute: look at the input string. You might have noticed that your brain has a harder time processing what constitutes the `onclick`, because the signal--an ending double quote--is not actually right. The thing that told you where to stop was the `style=`. How do you code that in? You can't. Remember that `style=` can also appear in JS code so this is really **impossible**. – rvighne Feb 12 '14 at 01:42
  • Point taken. I know about parsing invalid HTML (and even valid HTML) but I wanted to have a go at it anyway to see where I ended up. – tommoyang Feb 12 '14 at 01:50
  • Thanks for contribution. Everyone knows it's an invalid input, but there are scenarios there these circumstances have to be solved. Scenarios over minification, API method argument expectations or something along the lines. There's always the consideration of solving things in a robust way to guarantee stability one or the other way. This time, for me, it is in favour of RegEx which is why I posed the question :) – Ain Tohvri Feb 12 '14 at 12:43