1

I'm trying to find a regular expression that would allow me replace the SRC attribute in an image. Here is what I have:

function getURL($matches) {
  global $rootURL;
  return $rootURL . "?type=image&URL=" . base64_encode($matches['1']);
}

$contents = preg_replace_callback("/<img[^>]*src *= *[\"']?([^\"']*)/i", getURL, $contents);

For the most part, this works well, except that anything before the src=" attribute is eliminated when $contents is echoed to the screen. In the end, SRC is updated properly and all of the attributes after the updated image URL are returned to the screen.

I am not interested in using a DOM or XML parsing library, since this is such a small application.

How can I fix the regex so that only the value for SRC is updated?

Thank you for your time!

OM The Eternity
  • 15,694
  • 44
  • 120
  • 182
Oliver Spryn
  • 16,871
  • 33
  • 101
  • 195

4 Answers4

2

Use a lazy star instead of a greedy one.

This may be your problem:

/<img[^>]*src *= *[\"']?([^\"']*)/
         ^

Change it to:

/<img[^>]*?src *= *[\"']?([^\"']*)/

This way, the [^>]* matches the smallest possible number of your bracket expression, rather than the largest possible.

ghoti
  • 45,319
  • 8
  • 65
  • 104
1

Do another grouping and prepend it to the return value?

function getURL($matches) {
  global $rootURL;
  return $matches[1] . $rootURL . "?type=image&URL=" . base64_encode($matches['2']);
}

$contents = preg_replace_callback("/(<img[^>]*src *= *[\"']?)([^\"']*)/i", getURL, $contents);
Andreas Wong
  • 59,630
  • 19
  • 106
  • 123
0

You can check for spaces too
Use this:

/<\s*img[^>]*?src\s*=\s*(["'])([^"']+)\1[^>]*?>/giu

https://regex101.com/r/jmMoio/1

Farhad Sakhaei
  • 894
  • 10
  • 28
0

I am not interested in using a DOM or XML parsing library, since this is such a small application.

Nevertheless, that is the correct approach regardless of your application size.

Remember, when you modify elements with DOMDocument, you should iterate in reverse to avoid unexpected oddities - in particular if you remove anything.

Here's a working example using DOMDocument. It's more complicated than a regex, but not terribly difficult and a lot more flexible and robust for any other tweaking the may be required.

function inner_html($node) {
    $innerHTML = "";
    foreach ($node->childNodes as $child) {
        $innerHTML .= $node->ownerDocument->saveHTML($child);
    }
    return $innerHTML;
}
function replace_src($html) {
    $rootURL = 'https://example.com';
    $dom = new DOMDocument();
    if (mb_detect_encoding($html, 'UTF-8', true) == 'UTF-8') {
        $html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');
    }
    $dom->loadHTML('<body>' . $html . '</body>', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    for ($els = $dom->getElementsByTagname('img'), $i = $els->length - 1; $i >= 0; $i--) {
        $src = $els->item($i)->getAttribute('src');
        $els->item($i)->setAttribute('src', $rootURL . '?type=image&URL=' . $src);
    }
    return inner_html($dom->documentElement);
}

$html = '
    <div>
        <img src="test123">
        <img src="test456">
    </div>
';

echo replace_src($html);

OUTPUT:

<div>
    <img src="https://example.com?type=image&amp;URL=test123">
    <img src="https://example.com?type=image&amp;URL=test456">
</div>