3

I have some markup that includes images with the following src attribute:

https://url.com/image/img.jpg

I want to replace any image (or href) that contains /image/ in its path. So the result would be:

https://newurl.com/img.jpg

I've tried using:

/src="(?:[^'\/]*\/)*([^']+)"/g

but not sure how to get it to only match /image/ paths, also when I change src to href it doesn't seem to allow me to replace both.

To clarify, I'm parsing plain text that happens to contain html strings. Also, I need to be able to maintain the file name, but replace the host address.

Update: Here's a jsfiddle of what I have so far. Although it works for the src, it's not taking into account the /image/ in the path, also it removes the href.

Josh Crozier
  • 233,099
  • 56
  • 391
  • 304
dzm
  • 22,844
  • 47
  • 146
  • 226

2 Answers2

2

Obligatory don't use regex to parse HTML...


Since you are already using JavaScript, you could use the native DOM API to iterate over all of the img elements and update the src attributes:

Array.prototype.map.call(document.querySelectorAll('img'), function(img) {
  img.src = img.src.replace(/\/image\//, '/');
});

But since you clarified that you have a string that contains HTML, you could create a temporary element, insert the string as HTML, replace the src attributes, and then retrieve the updated innerHTML property value.

For example:

var content = `string of content containing random text, some elements, <p>and paragraphs</p> and more text.. <img src="https://url.com/image/img.jpg" /><img src="https://url.com/image/img.jpg" />`;

// Create the temporary DOM element
var temporaryElement = document.createElement('div');
temporaryElement.innerHTML = content;

// Replace the `src` attributes
Array.from(temporaryElement.querySelectorAll('img')).forEach((img) => {
  img.src = img.src.replace(/\/image\//, '/');
});

// Retrieve the updated `innerHTML` property
var updatedContent = temporaryElement.innerHTML;
console.log(updatedContent);
Community
  • 1
  • 1
Josh Crozier
  • 233,099
  • 56
  • 391
  • 304
  • Sorry I should have clarified, I'm not parsing dom, this is just plain text content that happens to have html elements. – dzm Jan 30 '17 at 19:39
  • This one fails if the URL happens to end in `image` like `https://newurl.com/somepath/image.jpg` - check for the presence of the trailing slash. Then again, the OP didn't explicitly say that wasn't the desired behaviour....nevermind. – Adam Jenkins Jan 30 '17 at 19:40
  • @dzm - Like this? I just updated the fiddle you posted.. https://jsfiddle.net/3mgo3u8L/ – Josh Crozier Jan 30 '17 at 19:51
  • @JoshCrozier Thanks, it looks like this could work, although I'd really prefer to use regex. I'm doing this server side and while I could put it into dom, it'd be much cleaner/faster I think to use regex. – dzm Jan 30 '17 at 20:00
0

This Should Work.

REGEXP:

(?:^(?:https:\/{2})(?:\w*\.*|\d*\.*)(?:\w*|\d*)*(?:\/image\/))(.+)$

INPUT:

https://url.com/image/img.jpg
https://url.com/image/asdasdasd.jpg

REPLACE with: https://newurl.com/

https://newurl.com/$1

RESULT:

https://newurl.com/img.jpg
https://newurl.com/asdasdasd.jpg

JAVASCRIPT CODE:

const regex = /(?:^(?:https:\/{2})(?:\w*\.*|\d*\.*)(?:\w*|\d*)*(?:\/image\/))(.+)$/gm;
const str = `https://url.com/image/img.jpg
https://url.com/image/asdasdasd.jpg
`;
const subst = `https://newurl.com/\$1`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

See: https://regex101.com/r/I4hHV7/2

  • This is close, but if there's any other text in `str`, it doesn't seem to replace. – dzm Jan 30 '17 at 19:59