0

I am using the following php function (in wordpress) to add a data attribute to all image links:

function ccd_fancybox_image_attribute( $content ) {
       global $post;
       $pattern = "/<a(.*?)href=('|\")(.*?).(bmp|gif|jpeg|jpg|png)('|\")(.*?)>/i";
       $replace = '<a$1href=$2$3.$4$5 data-type="image" data-fancybox="image">';
       $content = preg_replace( $pattern, $replace, $content );
       return $content;
}
add_filter( 'the_content', 'ccd_fancybox_image_attribute' );

This correctly changes a link like this:

<a href="image.jpg"><img src="image.jpg"></a>

to this:

<a href="image.jpg" data-type="image" data-fancybox="image"><img src="image.jpg"></a>

But this code is also incorrectly affecting images that link to PDF files, oddly adding the data attributes to the image itself, not the link. For example, a PDF link like this:

<a href="file.pdf"><img src="image.jpg"></a>

turns into this:

<a href="file.pdf"><img src="image.jpg" data-type="image" data-fancybox="image"></a>

I only barely understand regex, so I'm not sure how to adjust it. How do I fix the code so that it only applies to links to image files, not links to PDFs.

Thank you!

LBF
  • 1,133
  • 2
  • 14
  • 39

1 Answers1

0

Your regex matches everything between <a href=" and the file extension. In your case, it matches "file.pdf">

I suppose you could fix your immediate problem by updating your regular expression to:

<a(.*?)href=('|\")([^\>]*).(bmp|gif|jpeg|jpg|png)('|\")(.*?)>

For more explanation of how and why this works and the other does not, see https://regexr.com/3vcgk

However, be warned that more such things will eventually happen again, because html is not regular, which makes regular expressions a notoriously unstable solution for these kinds of problems. Also see this answer.

Stratadox
  • 1,291
  • 8
  • 21
  • Thank you! This solved my problem. I agree -- I would rather not use regex, period. But, I haven't been able to find a working alternative yet. – LBF Sep 12 '18 at 17:09
  • Your best bet is probably a html parsing library, such as [htmlpagedom](https://github.com/wasinger/htmlpagedom) or [php-html-parser](https://github.com/paquettg/php-html-parser). – Stratadox Sep 12 '18 at 17:18
  • 1
    Oh, and you probably want to watch out for that innocent looking dot in `.(bmp|gif|jpeg|jpg|png)` - a non-escaped dot in a regex means "any character" not "a dot character" so the regex would also match, eg. `` or `` – Stratadox Sep 12 '18 at 17:21