0

I have following HTML:

<td width=140 style='width:105.0pt;padding:0cm 0cm 0cm 0cm'>
    <p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";
       mso-fareast-font-family:"Times New Roman";color:#666666'>OCCUPANCY
       TAX:</span></p>
</td>

Some of the HTML attributes are not quoted, like for example: width=140 and class=MsoNormal

Are there any PHP function for that sort of thing, if not what would be the clever way of sanitizing this in HTML?

Thank you.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
toni rmc
  • 848
  • 2
  • 10
  • 25
  • 1
    There is no native php function, and it's already sanitized. The **only** time that `""` are *really* required are when there are special characters or spaces present in the value. Given that, I think it'd be best to just clean the files up yourself, using a text editor such as sublime. – Ohgodwhy Nov 07 '14 at 17:45
  • I have to solve this programmatically. width=140 without quotes gives me trouble because I'm using quoted_printable_decode() function and when it finds =140 converts it to some unvanted character. However with='140' (with quotes) is fine. But I would like some clever way of quoting all of the attributtes in entire file. – toni rmc Nov 07 '14 at 17:51
  • Maybe [a PHP DOM parser](http://simplehtmldom.sourceforge.net/)? – Jay Blanchard Nov 07 '14 at 17:59
  • I advise you to not use inline styling. Separate your style from your markup, it will save you a lot of headaches. Believe me. – nunoarruda Nov 07 '14 at 18:01
  • @Nuno Aruda this is HTML I get, I didn't wrote it. I have to work with it. – toni rmc Nov 07 '14 at 18:55
  • The HTML is not invalid. Attribute values only require quotes if the value includes particular characters (and [0-9][a-z][A-Z] are not among them). It sounds like your problem is that you are trying to decode data using quoted_printable_decode when it isn't encoded that way in the first place. – Quentin Feb 01 '15 at 22:22

1 Answers1

2

I guess you could use regexp for this:

/\s([\w]{1,}=)((?!")[\w]{1,}(?!"))/g


\s match any white space character [\r\n\t\f ]
1st Capturing group ([\w]{1,}=)
    [\w]{1,} match a single character present in the list below
        Quantifier: {1,} Between 1 and unlimited times, as many times as possible, giving back as needed [greedy]
    \w match any word character [a-zA-Z0-9_]
    = matches the character = literally
2nd Capturing group ((?!")[\w]{1,}(?!"))
    (?!") Negative Lookahead - Assert that it is impossible to match the regex below
    " matches the characters " literally
    [\w]{1,} match a single character present in the list below
        Quantifier: {1,} Between 1 and unlimited times, as many times as possible, giving back as needed [greedy]
    \w match any word character [a-zA-Z0-9_]
    (?!") Negative Lookahead - Assert that it is impossible to match the regex below
    " matches the characters " literally
g modifier: global. All matches (don't return on first match)

Which would be implemented something like this:

echo preg_replace_callback('/\s([\w]{1,}=)((?!")[\w]{1,}(?!"))/', function($matches){
    return ' '.$matches[1].'"'.$matches[2].'"';
}, $str);

And would result in:

 <td width="140" style='width:105.0pt;padding:0cm 0cm 0cm 0cm'>
   <p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Arial","sans-serif";
     mso-fareast-font-family:"Times New Roman";color:#666666'>OCCUPANCY
      TAX:</span></p>
 </td>

Eval.in live example

Note, this is a down and dirty example, and can surely be cleaned up.

Ohgodwhy
  • 49,779
  • 11
  • 80
  • 110