-1

I want get match with checking both side expropriation of main match.

var str = 1234 word !!! 5678 another *** 000more))) get word and another

console.log(str.match(/(?!\d+\s?)\w+(?=\s?\W+)/g))
>> (3) ["word", "another", "more"]

it check both side but not include in the main match sets.

But in html it not working [not working]

var str = '<a href="url"></a><a href="url2"></a><a href="url3"></a>'; get url, url2 and url3

console.log(str.match(/(?!href=")[^"]+?(?=")/g))
>> (6) ["<a href=", "url", "></a><a href=", "url2", "></a><a href=", "url3"]

I try to Negative lookarounds using (?!href=") and Positive lookarounds using (?=") to match only the value of its attribute but it return more attributes.

Is there any way to so like this here, Thanks

Kos
  • 4,890
  • 9
  • 38
  • 42

3 Answers3

0

What you could do for your example data is capture what is between double quotes href="([^"]+) in an captured group and loop through the result:

var str = '<a href="url"></a><a href="url2"></a><a href="url3"></a>';
var pattern = /href="([^"]+)/g;
var match = pattern.exec(str);
while (match != null) {
  console.log(match[1]);
  match = pattern.exec(str);
}
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • It's need to make an process to get my result. –  Jan 28 '18 at 10:36
  • Is there any way to get my result with `match` function :) I really locking for ans with `match` function –  Jan 28 '18 at 10:37
  • I think [this post](https://stackoverflow.com/questions/9214754/what-is-the-difference-between-regexp-s-exec-function-and-string-s-match-fun) will explain the use of exec instead of match. This regex is based on your provided string but the answer from @Niet the Dark Absol is better when you can use a DOM parser. – The fourth bird Jan 28 '18 at 11:08
0

In other flavors of regex you could have used e.g. positive lookbehind ((?<=href="), but unfortunately Javascript regex does not support lookbehinds.

A reasonable solution is:

  • Match href=" as "ordinary" content, to be ignored.
  • Match the attribute value as a capturing group ((\w+)), to be "consumed".
  • Set the boundary of the above group with a *positive lookup" ((?=")), just as you did.

So the whole regex can be:

href="(\w+)(?=")

and read "your" value from group 1.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • It's a shame JS doesn't have lookbehinds. They're useful... But anyway, if you're already only taking group 1, there's no real benefit to using a lookahead for the last `"`, just match it outside the group `. – Niet the Dark Absol Jan 28 '18 at 10:51
0

You can't parse HTML with regex. Because HTML can't be parsed by regex.

Have you tried using the DOM parser that's right at your fingertips?

var str = '<a href="url"></a><a href="url2"></a><a href="url3"></a>';
var div = document.createElement('div');
div.innerHTML = str; // parsing magic!
var links = Array.from(div.getElementsByTagName("a"));
var urls = links.map(function(a) {return a.href;});
// above returns fully-resolved absolute URLs.
// for the literal attribute value, try a.getAttribute("href")
console.log(urls);
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592