RegEx for matching the first instance of a URL

Question

Say I have the HTML in a string variable htmlString and I want to find the first instance of an mp3 link in the html, and store that link in a variable.

<html>
...
src="https://example.com/mp3s/2342344?id=24362456"
...
</html>

The link https://example.com/mp3s/2342344?id=24362456 will be extracted.

Note there are lots of other urls in the html, but I just want the one in this format.

How do I get this?

You usually wouldn't use regex to handle HTML, one approach is to scan for URL's and extract these. — Andreas Louv, May 04 '19 at 16:42
Possible duplicate of https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string — Andreas Louv, May 04 '19 at 16:43

Emma · Answer 1 · 2019-05-04T20:13:52.197

While it is not usually recommended to parse HTMLs using regular expressions, this expression might help you to design an expression, if you wish/have to get the first mp3 URL.

^(src=\x22(https:\/\/[a-z]+.com\/mp3s\/[0-9]+\?id=[0-9]+)\x22)[\s\S]*

I have added several boundaries to it, just to be safe, which you can simply remove it from or simplify it in the second capturing group where your desired URL is:

 (https:\/\/[a-z]+.com\/mp3s\/[0-9]+\?id=[0-9]+)

The key is that to add a [\s\S]* such that it would pass everything else after capturing the first URL.

Graph

This graph shows how it would work:

JavaScript Demo with 10 million times performance benchmark

repeat = 10000000;

start = Date.now();

for (var i = repeat; i >= 0; i--) {
 var string = 'src=\"https://example.com/mp3s/2342344?id=24362456\" src=\"https://example.com/mp3s/08103480132984?id=0a0f8ad0f8\" src=\"https://example.com/mp3s/2342344?id=24362456\" href=\"https://example.com/mp3s/2342344?id=91847890\" src=\"https://example.com/mp3s/2342344?id0980184\"';
 var regex = /^(src=\x22(https:\/\/[a-z]+.com\/mp3s\/[0-9]+\?id=[0-9]+)\x22)[\s\S]*/g;

 var match = string.replace(regex, "$2");
}

end = Date.now() - start;

console.log(match + " is a match  ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test.  ");

RegEx for matching the first instance of a URL

1 Answers1

Graph

JavaScript Demo with 10 million times performance benchmark