-2

Say I have the HTML in a string variable htmlString and I want to find the first instance of an mp3 link in the html, and store that link in a variable.

<html>
...
src="https://example.com/mp3s/2342344?id=24362456"
...
</html>

The link https://example.com/mp3s/2342344?id=24362456 will be extracted.

Note there are lots of other urls in the html, but I just want the one in this format.

How do I get this?

Emma
  • 27,428
  • 11
  • 44
  • 69
cannyboy
  • 24,180
  • 40
  • 146
  • 252

1 Answers1

0

While it is not usually recommended to parse HTMLs using regular expressions, this expression might help you to design an expression, if you wish/have to get the first mp3 URL.

^(src=\x22(https:\/\/[a-z]+.com\/mp3s\/[0-9]+\?id=[0-9]+)\x22)[\s\S]*

I have added several boundaries to it, just to be safe, which you can simply remove it from or simplify it in the second capturing group where your desired URL is:

 (https:\/\/[a-z]+.com\/mp3s\/[0-9]+\?id=[0-9]+)

The key is that to add a [\s\S]* such that it would pass everything else after capturing the first URL.

enter image description here

Graph

This graph shows how it would work:

enter image description here

JavaScript Demo with 10 million times performance benchmark

repeat = 10000000;

start = Date.now();

for (var i = repeat; i >= 0; i--) {
 var string = 'src=\"https://example.com/mp3s/2342344?id=24362456\" src=\"https://example.com/mp3s/08103480132984?id=0a0f8ad0f8\" src=\"https://example.com/mp3s/2342344?id=24362456\" href=\"https://example.com/mp3s/2342344?id=91847890\" src=\"https://example.com/mp3s/2342344?id0980184\"';
 var regex = /^(src=\x22(https:\/\/[a-z]+.com\/mp3s\/[0-9]+\?id=[0-9]+)\x22)[\s\S]*/g;

 var match = string.replace(regex, "$2");
}

end = Date.now() - start;

console.log(match + " is a match  ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test.  ");
Emma
  • 27,428
  • 11
  • 44
  • 69