I'm trying to understand this code:
function extractLinks(input) {
var html = input.join('\n');
var regex = /<a\s+([^>]+\s+)?href\s*=\s*('([^']*)'|"([^"]*)|([^\s>]+))[^>]*>/g;
var match;
while (match = regex.exec(html)) {
var hrefValue = match[3];
if (hrefValue == undefined) {
var hrefValue = match[4];
}
if (hrefValue == undefined) {
var hrefValue = match[5];
}
console.log(hrefValue);
}
}
By all means, this is a simple function, that extracts all href values, but only these, which are real hrefs, e.g. href that is defined as class="href"
, or outside A tag, etc. are not included.
The thing that is weird about all this, is that the regex
I created for this calculation is
(<a[\s\S]*?>)
but when I didn't manage to find a solution, and looked at the original one, I found this very long regex
.
Tried this solution with my regex
, it won't work.
Can please, someone explain, how can I interpret this long regex
.
And then, match returns an array, well.
Let me see If I get the idea of this while loop:
while ( match = the regex is present in the string) { something = match[3] / why 3???/ and then if undefined something = match[4], if undefined again something = match[5]; }
I do really struggle to understand the mechanism behind all of this, as well as the logic in the regex
.
The input is generated by a system, which will parse 10 different arrays of strings, but lets take one, which I use to test: The code below is parsed as array of strings with length as the lines, every line is a separate element in the array, and this is the argument input for the function.
<!DOCTYPE html>
<html>
<head>
<title>Hyperlinks</title>
<link href="theme.css" rel="stylesheet" />
</head>
<body>
<ul><li><a href="/" id="home">Home</a></li><li><a
class="selected" href=/courses>Courses</a>
</li><li><a href =
'/forum' >Forum</a></li><li><a class="href"
onclick="go()" href= "#">Forum</a></li>
<li><a id="js" href =
"javascript:alert('hi yo')" class="new">click</a></li>
<li><a id='nakov' href =
http://www.nakov.com class='new'>nak</a></li></ul>
<a href="#empty"></a>
<a id="href">href='fake'<img src='http://abv.bg/i.gif'
alt='abv'/></a><a href="#"><a href='hello'></a>
<!-- This code is commented:
<a href="#commented">commentex hyperlink</a> -->
</body>