0

I have this regex:

//remove quotes from HTML attributes that does not contain spaces; keep quotes around URLs

var $result = preg_replace('/((\S)+\s*(?<!href)(?<!src)(=)\s*)(\"|\')(\S+)(\"|\')/','$1$5', $string);

It's almost working as intended, I just need to adapt it with an additional exception: allow remove quotes from "src" attributes of "img" tags.

Could please someone give me a tip? Any help would be greatly appreciated.

Eduardo SR
  • 59
  • 5
  • Why do you want to unquote attribute values? – chris85 May 10 '17 at 23:10
  • I'm creating an html minification script to run internally on a website. – Eduardo SR May 10 '17 at 23:12
  • Use a DOM parser library, not regexp, to parse HTML. – Barmar May 10 '17 at 23:25
  • Isn't there an HTML miniifcation library you can use, instead of rolling your own? – Barmar May 10 '17 at 23:26
  • You need to do this using preg_replace_callback(). First match img tag, then replace quotes on the src attributes if necessary. –  May 10 '17 at 23:27
  • @Barmar, I'm already using Simple HTML Parser, I think it isn't a question to DOM parser. If it is, please point me a way to remove quotes via DOM. – Eduardo SR May 10 '17 at 23:34
  • Quotes aren't in the DOM, they're only in HTML. So get the attribute from the parse, check whether it contains spaces. If it does, quote it in the minified version; if not, don't quote it. – Barmar May 10 '17 at 23:37
  • @Barmar, I tested the most popular HTML minification libraries of Github, but couldn't find a perfect one, so I got code parts of some of them. Now I think I'm missing only this issue with src attributes. – Eduardo SR May 10 '17 at 23:37
  • @Barmar, I'm providing img tags already quoted, I need to remove the quotes, I can't touch the initial HTML. I know the quotes aren't on DOM, that's why I think isn't a issue to DOM parser. I mean, with DOM parser I can get the src attribute value (without the quotes) and can get the outer text of the whole img tags as well, but at end I need regex to match the src attribute with quotes. – Eduardo SR May 10 '17 at 23:57
  • The way this should work is that you parse the HTML, then go through the DOM hierarchy creating minified HTML from it. When you're creating the HTML for an attribute, wrap it in quotes if it contains space or `>`, don't wrap it in quotes otherwise. – Barmar May 11 '17 at 00:00
  • @sln, good point. But at end I'll need a regex to match the src attribute anyway. – Eduardo SR May 11 '17 at 00:02
  • @Barmar, thanks, but all the original HTML already have quotes to attributes, including img tags, and I can't edit it, so I don't need to "add" quotes, I exclusively need to "remove" it. That is the point. – Eduardo SR May 11 '17 at 00:07
  • You're minifying, so you're creating totally new HTML that should have the same semantics as the original. It's just two steps: parse the HTML into DOM structure, output optimized HTML from the DOM. – Barmar May 11 '17 at 00:10
  • Possible duplicate of [Remove all attributes from an html tag](http://stackoverflow.com/questions/3026096/remove-all-attributes-from-an-html-tag) – MTroy May 11 '17 at 01:22

1 Answers1

0

Found a bad coded solution by adaptation, sorry I'm not a regex expert.

$imgTag = '<img src="image.jpg"/>';
$imgTag = preg_replace('/((\S)+\s*(?<=src)(=)\s*)(\"|\')(\S+)(\"|\')/','$1$5',$imgTag);

Results:

<img src=image.jpg/>
Eduardo SR
  • 59
  • 5