Regex check both side of match but not include in match string

Question

I want get match with checking both side expropriation of main match.

var str = 1234 word !!! 5678 another *** 000more))) get word and another

console.log(str.match(/(?!\d+\s?)\w+(?=\s?\W+)/g))
>> (3) ["word", "another", "more"]

it check both side but not include in the main match sets.

But in html it not working [not working]

var str = '<a href="url"></a><a href="url2"></a><a href="url3"></a>'; get url, url2 and url3

console.log(str.match(/(?!href=")[^"]+?(?=")/g))
>> (6) ["<a href=", "url", "></a><a href=", "url2", "></a><a href=", "url3"]

I try to Negative lookarounds using (?!href=") and Positive lookarounds using (?=") to match only the value of its attribute but it return more attributes.

Is there any way to so like this here, Thanks

score 0 · Accepted Answer · answered Jan 28 '18 at 10:25

0

What you could do for your example data is capture what is between double quotes href="([^"]+) in an captured group and loop through the result:

var str = '<a href="url"></a><a href="url2"></a><a href="url3"></a>';
var pattern = /href="([^"]+)/g;
var match = pattern.exec(str);
while (match != null) {
  console.log(match[1]);
  match = pattern.exec(str);
}

answered Jan 28 '18 at 10:25

The fourth bird

154,723
16
55
70

It's need to make an process to get my result. – Jan 28 '18 at 10:36
Is there any way to get my result with `match` function :) I really locking for ans with `match` function – Jan 28 '18 at 10:37
I think [this post](https://stackoverflow.com/questions/9214754/what-is-the-difference-between-regexp-s-exec-function-and-string-s-match-fun) will explain the use of exec instead of match. This regex is based on your provided string but the answer from @Niet the Dark Absol is better when you can use a DOM parser. – The fourth bird Jan 28 '18 at 11:08

score 0 · Answer 2 · answered Jan 28 '18 at 10:45

0

In other flavors of regex you could have used e.g. positive lookbehind ((?<=href="), but unfortunately Javascript regex does not support lookbehinds.

A reasonable solution is:

Match href=" as "ordinary" content, to be ignored.
Match the attribute value as a capturing group ((\w+)), to be "consumed".
Set the boundary of the above group with a *positive lookup" ((?=")), just as you did.

So the whole regex can be:

href="(\w+)(?=")

and read "your" value from group 1.

answered Jan 28 '18 at 10:45

Valdi_Bo

30,023
4
23
41

It's a shame JS doesn't have lookbehinds. They're useful... But anyway, if you're already only taking group 1, there's no real benefit to using a lookahead for the last `"`, just match it outside the group `. – Niet the Dark Absol Jan 28 '18 at 10:51

score 0 · Answer 3 · answered Jan 28 '18 at 10:50

You can't parse HTML with regex. Because HTML can't be parsed by regex.

Have you tried using the DOM parser that's right at your fingertips?

var str = '<a href="url"></a><a href="url2"></a><a href="url3"></a>';
var div = document.createElement('div');
div.innerHTML = str; // parsing magic!
var links = Array.from(div.getElementsByTagName("a"));
var urls = links.map(function(a) {return a.href;});
// above returns fully-resolved absolute URLs.
// for the literal attribute value, try a.getAttribute("href")
console.log(urls);

Regex check both side of match but not include in match string

3 Answers3