-1

I need to RegEx that I am using in to find strings to replace via a Grunt task.

I am wanting to swap out locally referenced image URLs with their external URLs. The file structure is the same both locally and externally.

I need to look for /img or img at the start of a URL (either in an img tag's src attribute, or CSS rules) and swap in the external URL.

E.g. look for /img/some-photo.jpg or img/some-photo.jpg and replace /img or img with https://www.example.com/some-photo.jpg.

I can't just look for img alone as that would also match the string in the following HTML <img .../> turning the tag into <https://www.example.com .../>.

I can exclude the img tag like this:

/\/img|(^|[^\<])img/gim

But that also matches, for example:

(img 'img "img

etc.

I don't want to exclude the string img in these examples, I just don't also want to capture the preceding characters ((, ', ", etc.)

You can see this in action here: https://regexr.com/3qbeq

Jonathon Oates
  • 2,912
  • 3
  • 37
  • 60
  • 1
    Or to put it another way: Don't use RegEx (on its own) to parse HTML. More humorously: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454 Use an HTML parser to parse HTML. There are several you can use from Grunt. – T.J. Crowder May 31 '18 at 14:24
  • Because the Grunt task I'm using can only accept a string to find, or a RegEx to find a match. Personally I would want to use Gulp and a task that can accept a function. Though the codebase I've inherited is already using Grunt and grunt-text-replace – Jonathon Oates May 31 '18 at 14:26

2 Answers2

0

It looks like you need to look at negative lookbehind, or if your code always encloses attribute in single or double quotes, include those.

A negative lookbehind solution would be something like (?<!\<)img which should match any string img that doesn't have a < in front. A quick internet search for "negative lookbehind" will yield many examples. Lookaheads and lookbehinds are a "level-up" in regex mastery.

Or, just add in the quotes and a conditional for the slash. "\/?img which will match attributes but not tags, because tags don't begin with a quote. Again, this only works if you enclose all your attributes with quotes. Perhaps not as elegant or failsafe as a lookbehind, but might do the job.

Sean Hogge
  • 386
  • 2
  • 14
  • Thanks. I'm using JavaScript (Grunt is a Node based task runner) and it doesn't support negative lookbehind I'm afraid — I've already looked at that :-) If I could find a way to only match the last three characters that may work. – Jonathon Oates May 31 '18 at 14:35
  • Oh really? I didn't know that, being only passingly familiar with Grunt. The quote solution would still work, you just have to add the quote back in the replacement text. Or even include the whole "src=\/?img" portion to get more specific. – Sean Hogge May 31 '18 at 18:13
0

I was grossly overthinking the problem.

If the images were all in a directory img it therefore my be followed by a forward slash. Tags (and for that matter, CSS rules like img {}) aren't followed by a forward slash and therefore I only have to look for match of img/ or /img/.

This can easily be achieved by /(\/?img\/)/gi.

Jonathon Oates
  • 2,912
  • 3
  • 37
  • 60