0

I have a regular expression like this :

/(?:<script\s+[^>]*?src="([^"]+)"[^>]*><\/script>)|(?:<link\s+[^>]*?href="([^"]+)"[^>]*>)/g

I want to replace the "src" in <script> tag, or "href" in <link /> tag with javascript with this regexp.

the code like this :

html.replace( /(?:<script\s+[^>]*?src="([^"]+)"[^>]*><\/script>)|(?:<link\s+[^>]*?href="([^"]+)"[^>]*>)/g, function( m, n ) {
    return m.replace( n, 'other url' );
}

It is working fine with <script> tag but not link tag. coz the regexp still set the first match in ([^"]+) in to the arguments, so that the "n" param is undefined as it is not match <script> tag. if the regexp match a <link> tag, the code must be modified to :

html.replace( /(?:<script\s+[^>]*?src="([^"]+)"[^>]*><\/script>)|(?:<link\s+[^>]*?href="([^"]+)"[^>]*>)/g, function( m, n ) {
    return m.replace( arguments[ 2 ], 'other url' );
}

Is there any way to make the regexp not capture the first match if it does not match a <script> tag?

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
LCB
  • 971
  • 9
  • 22

2 Answers2

1

It sounds like what you want is:

html.replace(/(<script\s[^>]*?src="|<link\s[^>]*?href=")[^"]+"/g, function ($0, $1) {
    return $1 + 'other url' + '"';
});

(with the usual caveats that "You can't parse [X]HTML with regex").


Edited to add: The "minimal fix" would be to write your replacement-function like this:

function ($0, $1, $2) {
    return m.replace($1 || $2, 'other url');
}

where || is the Boolean OR operator: $1 || $2 means "if $1 is truthy, then $1; otherwise, $2". A non-empty string is truthy, whereas undefined is falsy, so $1 || $2 will evaluate to whichever of your capture-groups matched something.

(Note: if your capture-groups were able to match the empty string, you'd have to write something a bit more complicated, since you wouldn't want to end up with $2 if $1 is '' and $2 is undefined. But in your example that doesn't apply.)

Community
  • 1
  • 1
ruakh
  • 175,680
  • 26
  • 273
  • 307
  • thanks. I know how to solve this problem. In fact i want to know something about regexp – LCB Jan 18 '16 at 10:20
  • @LCB: OK. I've edited my answer to provide a "minimal fix" for you. Though it doesn't actually have to do with the regex at all! – ruakh Jan 18 '16 at 18:44
0

@ruakh is correct, you shouldn't be using regex to parse html, try this instead

var div = document.createElement('div');
div.innerHTML = html;

var scriptTags = div.getElementsByTagName('script');
for (var i = 0; i < scriptTags.length; i++)
    scriptTags[i].src = 'other url';

var linkTags = div.getElementsByTagName('link');
for (var i = 0; i < linkTags.length; i++)
    linkTags[i].href = 'other url';

if you can use jQuery it's even easier

var div = $('<div/>').html(html);
div.find('<script/>').attr('src', 'other url');
div.find('<link/>').attr('href', 'other url');
Vitani
  • 1,594
  • 1
  • 14
  • 28