So I have a string and I want to take a part of the string, that it matches. For example:
stringToFind:
"<html><head><script src="http://example.com"></script>..."
I need to get the source of the script tag. I need to find all instances of the script tag in the string, and get the url of the source.
I was going to use String.prototype.replace(), that uses regular expressions, but you have to replace it with something and the result is the whole string.
Asked
Active
Viewed 75 times
1

Cole Brennan
- 13
- 4
-
1If your input string is well-formed HTML, I recommend you use an HTML parser rather than regex. – CAustin May 27 '23 at 03:31
-
Obligatory link: [You can't parse \[X\]HTML with regex.](https://stackoverflow.com/a/1732454) – InSync May 27 '23 at 06:36
-
@CAustin exactly! javascript actually has a built in XML/HTML parser called `DOMParser` (see my answer). Even if the html is badly formed, there is probably a library for fixing broken html one could use before parsing. – Levi May 27 '23 at 07:03
2 Answers
1
const sources = document.documentElement.innerHTML.match(/(?<=<script.+src=")[^"]+(?=".*>\s*<\/script\s*>)/g);
console.log(sources);
<script
src="https://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.2/knockout-min.js"
type="text/javascript"
></script>
<script defer src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.7.0/d3.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

Alexander Nenashev
- 8,775
- 2
- 6
- 17
1
javascript is pretty good at handling HTML markup, so you probably don't need a regex here.
this should do the trick:
//<!--
var html_code = `<html>
<head>
<script src="https://code.jquery.com/jquery-3.7.0.slim.min.js"></script>
<script src="/assets/script.js"></script>
</head>
<body>
<script>
var code = 'nope';
</script>
<p>Other stuff</p>
<footer><script src="/assets/footer.js"></script></footer>
</body>
</html>`;
// -->
const parser = new DOMParser();
const html_doc = parser.parseFromString(html_code, 'text/html');
const script_tags = html_doc.querySelectorAll('script[src]');
const sources = Array.from(script_tags).map((s) => s.getAttribute('src'));
console.log(sources);
if you need to extract script tags from the DOM in the browser, then you only need this:
console.log( Array.from(document.querySelectorAll('script[src]')).map((s) => s.getAttribute('src')) );
side note: // <!--
and // -->
is there to make the jsfiddle run (which it wont reliably when containing html code in strings) as suggested by @InSync

Levi
- 661
- 7
- 26
-
1A trick you may utilize: Just enclose `html_code` in `// ` (or even `` for that matter). – InSync May 27 '23 at 07:10
-
Wow, I never knew that about html commenting JavaScript code to have html strings. I was having a problem like that before. – Cole Brennan Jun 08 '23 at 01:47