-2
let htmlTEXT = `<h1>page 1</h1>
<input type="text" spellcheck="false" data-ms-editor="true">
<script src="/static/test.js">
    console.log("hello");
    console.log("goodbye");
</script>`

// hopeful regex magic to produce the below
// -> /static/test.js
/* -> 
   console.log("hello");
   console.log("goodbye"); */

I have a string which contains HTML, and I would like two regular expressions to get

  1. the script's src value
  2. the script's content
Sean Lee
  • 19
  • 5
  • 1
    Does this have to be done with regular expressions? It would be a lot more robust if you have the option to use javascript to create a new HTMLElement and use that to parse the htmlTEXT string. – NetByMatt Feb 07 '23 at 16:00
  • 1
    Obligatory: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Please pick [a better tool](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString) for this problem. – Quentin Feb 07 '23 at 16:04

1 Answers1

1

If you're trying this client-side, rather than regex, perhaps the easiest modern way would be to use a DOMParser, and then use parseFromString on the string.

For nodeJS look into cheerio.

If you see the following error:

SyntaxError: `` literal not terminated before end of script

you'll need to escape the the closing / in the </script> element first. You'll see if you just add what you have to a JS file it will error out with:

So change that to <\/script>.

Then you can parse the string, and then pick out its src and its text content.

const str = `<h1>page 1</h1>
<input type="text" spellcheck="false" data-ms-editor="true">
<script src="/static/test.js">
    console.log("hello");
    console.log("goodbye");
<\/script>`;

const parser = new DOMParser();
const doc = parser.parseFromString(str, 'text/html');

console.log(doc.querySelector('script').src);
console.log(doc.querySelector('script').textContent);
Andy
  • 61,948
  • 13
  • 68
  • 95
  • "However - and this action needs to be taken whatever process you use - you need to escape the the closing `/` in the `` element first" — This is only true if the JS appears inside a ` – Quentin Feb 07 '23 at 16:14
  • The DOMParser API is available in browsers, but the OP hasn't said they are running their JS in a browser (as opposed to, for example, Node.js). – Quentin Feb 07 '23 at 16:14