0

I'm given a string which contains the contents of an HTML document, and I need to modify some of the URLs contained within the document. The URLs which need modification begin with the form:

<script src="https://foo.com/some/variable/path/to/file.js" ...

And must be modified to:

<script src="https://foo.com/some/variable/path/to/NEW/file.js" ...

My current approach has been to use Google's RE2's GlobalReplace function with the regexp:

"(?i)(<script\\s+(?:[^>]+\\s+)?src=[\"']https://foo\\.com/" "(?:.*?/)*?)(.*?\\.js[\"'][^>]*>)"

Which almost works, until I realized that it's possible that the HTML that I'm given might already have some of the URLs modified and some not, the former of which should be left alone.

Question: What's the easiest way to go about modifying the URLs without modifying the ones that have already been modified upstream?

A single pass approach is essential.

Deomachus
  • 179
  • 1
  • 10
  • 1
    Write a regex that doesn't match the new ones. Since you're obfuscating them we can't help you do it. But at least we don't know the general organization of your site's files ( heaven forbid ) :P – erik258 Jan 06 '17 at 16:46
  • Apologies, I'm not sure I understand. Do you mean to write a regex that doesn't match the form of the modified URL? If so, I've considered that, but I'm not actually sure how that would help. – Deomachus Jan 06 '17 at 17:07
  • Then you can do a standard regex replace process and voila, only the broken urls are fixed, and since the new ones don't match, they won't be replaced. – erik258 Jan 06 '17 at 17:09
  • Pardon me if I'm wrong or misunderstanding, but I don't think that would work. The document I'm given can have any number of such script tags, some of which may need modifications, and some of which may already be modified. I need to be able to modify the ones that have not yet been modified, and to leave the already modified ones alone. – Deomachus Jan 06 '17 at 17:12
  • I realized my original post did not make that clear, so I've gone ahead and edited it. – Deomachus Jan 06 '17 at 17:21
  • Take a look at http://stackoverflow.com/questions/406230/regular-expression-to-match-line-that-doesnt-contain-a-word . – erik258 Jan 06 '17 at 17:24

0 Answers0