Invalid HTML - Quoting Attributes

Question

I have following HTML:

<td width=140 style='width:105.0pt;padding:0cm 0cm 0cm 0cm'>
    <p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";
       mso-fareast-font-family:"Times New Roman";color:#666666'>OCCUPANCY
       TAX:</span></p>
</td>

Some of the HTML attributes are not quoted, like for example: width=140 and class=MsoNormal

Are there any PHP function for that sort of thing, if not what would be the clever way of sanitizing this in HTML?

Thank you.

There is no native php function, and it's already sanitized. The **only** time that `""` are *really* required are when there are special characters or spaces present in the value. Given that, I think it'd be best to just clean the files up yourself, using a text editor such as sublime. — Ohgodwhy, Nov 07 '14 at 17:45
I have to solve this programmatically. width=140 without quotes gives me trouble because I'm using quoted_printable_decode() function and when it finds =140 converts it to some unvanted character. However with='140' (with quotes) is fine. But I would like some clever way of quoting all of the attributtes in entire file. — toni rmc, Nov 07 '14 at 17:51
Maybe [a PHP DOM parser](http://simplehtmldom.sourceforge.net/)? — Jay Blanchard, Nov 07 '14 at 17:59
I advise you to not use inline styling. Separate your style from your markup, it will save you a lot of headaches. Believe me. — nunoarruda, Nov 07 '14 at 18:01
@Nuno Aruda this is HTML I get, I didn't wrote it. I have to work with it. — toni rmc, Nov 07 '14 at 18:55
The HTML is not invalid. Attribute values only require quotes if the value includes particular characters (and [0-9][a-z][A-Z] are not among them). It sounds like your problem is that you are trying to decode data using quoted_printable_decode when it isn't encoded that way in the first place. — Quentin, Feb 01 '15 at 22:22

Ohgodwhy · Accepted Answer · 2014-11-07T18:27:36.373

I guess you could use regexp for this:

/\s([\w]{1,}=)((?!")[\w]{1,}(?!"))/g


\s match any white space character [\r\n\t\f ]
1st Capturing group ([\w]{1,}=)
    [\w]{1,} match a single character present in the list below
        Quantifier: {1,} Between 1 and unlimited times, as many times as possible, giving back as needed [greedy]
    \w match any word character [a-zA-Z0-9_]
    = matches the character = literally
2nd Capturing group ((?!")[\w]{1,}(?!"))
    (?!") Negative Lookahead - Assert that it is impossible to match the regex below
    " matches the characters " literally
    [\w]{1,} match a single character present in the list below
        Quantifier: {1,} Between 1 and unlimited times, as many times as possible, giving back as needed [greedy]
    \w match any word character [a-zA-Z0-9_]
    (?!") Negative Lookahead - Assert that it is impossible to match the regex below
    " matches the characters " literally
g modifier: global. All matches (don't return on first match)

Which would be implemented something like this:

echo preg_replace_callback('/\s([\w]{1,}=)((?!")[\w]{1,}(?!"))/', function($matches){
    return ' '.$matches[1].'"'.$matches[2].'"';
}, $str);

And would result in:

 <td width="140" style='width:105.0pt;padding:0cm 0cm 0cm 0cm'>
   <p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Arial","sans-serif";
     mso-fareast-font-family:"Times New Roman";color:#666666'>OCCUPANCY
      TAX:</span></p>
 </td>

Eval.in live example

Note, this is a down and dirty example, and can surely be cleaned up.

Obligatory "you can't parse HTML with regex": http://stackoverflow.com/a/1732454/1902010 — ceejayoz, Nov 07 '14 at 19:02

Invalid HTML - Quoting Attributes

1 Answers1