0

I have a string

<h1>hello/h1>
<script src="http://www.test.com/file1.js"></script>
<script src="http://www.test.com/file2.js"></script>
<p>bye</p>

and I need to generate an array with the urls found in the string.

['http://www.test.com/file1.js', 'http://www.test.com/file2.js']

also I need to replace the entire line (including the tags script tags) with nothing.

this is what I have so far to find the urls

^(<script src=")(.*)("><\/script>)$

problem with that is that it only works with

<script src="http://www.test.com/file1.js"></script>

if I define my scripts like this

<script id="something" src="http://www.test.com/file1.js"></script>

it doesn't work.

Emma
  • 27,428
  • 11
  • 44
  • 69
handsome
  • 2,335
  • 7
  • 45
  • 73

4 Answers4

3

Consider using a proper HTML parser instead, like cheerio: find <script> tags, remove them, and push their src to an array:

const cheerio = require('cheerio');

const htmlStr = `<h1>hello/h1>
<script src="http://www.test.com/file1.js"></script>
<script src="http://www.test.com/file2.js"></script>
<p>bye</p>`;
const $ = cheerio.load(htmlStr);

const urls = [];
$('script').each((_, script) => {
  urls.push(script.src);
  $(script).remove();
});
const result = $('body').html();
console.log(result);
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
-1

To get the urls only you can do:

^<script.*?src="(.*)".*?><\/script>$

This catches cases where attributes are before AND after the src attribute.

Jenian
  • 552
  • 1
  • 5
  • 18
-1

This RegEx might help you to get those URLs:

^<.+="(.+)"><\/.+>$

It creates a single group, where your target URLs are and filters everything else. It also works with <a> tags and other similar tags with open and close patterns.

enter image description here

Emma
  • 27,428
  • 11
  • 44
  • 69
-1

Use this insted

^(<script )(.*)(src=")(.*)("><\/script>)$

and the 4th groups are the urls

or ^(?:<script )(?:.*)(?:src=")(.*)(?:"><\/script>)$ to use non - capturing groups.