5

I've upgraded a WYSIWYG editor from an old version to the newest. There is a difference to how image dimensions are saved. The old version of the editor used to add width and height parameters to the image tag. The new editor creates a style parameter and adds width and height as a style.

I have a preg_replace function that I use so that I can wrap an <a> tag around the <img>.

The current preg_replace doesn't work anymore since the new editor saves width and height in the style parameter.

Preg replace:

$Content = preg_replace('#<img(.*?)src="([^"]*/)?(([^"/]*)\.[^"]*)"([^>]*?)>((?!</a>))#', '<a rel="group" class="fancybox fancy" title="" href="$2$3"><img$1src="$2$3"></a>', $Content);

If good to know, the new editor stores images like this:

<img alt="" src="" style="" />

Whereas the old editor stored images like this:

<img src="" width="404" height="228" alt="" />

How can I refactor my preg_replace to copy the complete style element as well? Backwards-compatibility would be cool too.

Thanks for your time :)

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
Gilly
  • 9,212
  • 5
  • 33
  • 36
  • You can match all elements, do the parsing and then just replace the old ones using the original matches array. All you need is to extend preg_match with the regex to several lines of code. – baldrs Jun 07 '13 at 12:14
  • You might want to consider using a DOM parser rather than regex for this kind of thing. – Spudley Jun 07 '13 at 12:28
  • 1
    **Don't use regular expressions to parse HTML**. You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester Jun 08 '13 at 02:47

3 Answers3

8

Try this:

$regex = '#<img([^>]*) src="([^"/]*/?[^".]*\.[^"]*)"([^>]*)>((?!</a>))#';
$replace = '<a rel="group" class="fancybox fancy" title="" href="$2"><img$1 src="$2"$3></a>';
$Content = preg_replace($regex, $replace, $Content);
prothid
  • 616
  • 4
  • 11
7

You could simplify the regex a lot more, note that you may use this solution if you expect the input to be correct, otherwise just use an html parser:

$string = 'Some text <img alt="bar" title="foo" src="http://example.com/example.jpg" style="width:200px;height:400px;" /> Some text';

$new_string = preg_replace('#<img.+?src="([^"]*)".*?/?>#i', '<a href="$1">$0</a>', $string);
var_dump($new_string);

Explanation:

  • <img : match <img
  • .+? : match anything one or more time (ungreedy)
  • src=" : match src="
  • ([^"]*) : match anything except " zero or more times and group it
  • ".*?/?> : match " and then anything until /> or >
  • i modifier : match case insensitive

You may want to use <img.+?src\s*=\s*"([^"]*)".*?/?>, you never know maybe there are spaces after and before =.

Online demo

HamZa
  • 14,671
  • 11
  • 54
  • 75
2

As Spudley mentioned in his comment, you might seriously consider a DOM parser (a quick Google turns up several options), especially if you don't have control over how the editor is adding the images (although modifying the editor to add the links might be even easier, depending on which one it is – I personally might not try to hack TinyMCE to do that, but might consider it for wysihtml5).

Anyway, I digress. With this approach here, try simplifying the regular expression as much as possible. You just need to wrap it in an <a> tag, not worry about what the attributes themselves are (as long as you preserve them).

So try something like this (I tested this expresion, but in Python, not PHP, so YMMV):

 preg_replace('<img(.*)src="([^ "]*)"([^>]*)>', '<a href="$2"><img$1src="$2"$3></a>', $whatever_your_string_is);
Community
  • 1
  • 1
Michael Schuller
  • 494
  • 4
  • 14
  • 1
    Thanks for your answer. I understand that I should use a DOM parser instead and will do so in the future. But for the time being, Prothid's answer worked out of the box for me. I've noted your suggestion :) – Gilly Jun 17 '13 at 13:37