2

I'm using HTML Purifier, a PHP "filter that guards against XSS and ensures standards-compliant output," to sanitize/standardize user-inputted markup.

This is an example of the user-inputted markup:

<font face="'Times New Roman', Times">TEST</font>

which generates:

<span style="font-family:&quot;Times New Roman&quot;, Times;">TEST</span>

I'm a bit confused, because &quot isn't even the escape char for a single quote. What's the best practice here since I'm going to be using this user generated content later?

  • Leave as is
  • Replace all &quot with \' after purifier executes
  • Configure HTML Purifier differently
  • Something else?
Kyle Cureau
  • 19,028
  • 23
  • 75
  • 104

2 Answers2

2

Looks okay to me.

I think the conversion from a single to a double quote comes from the fact that HTML purifier takes apart the entire tag, and puts it back together according to its own rules, which happen to use double quotes when quoting stuff inside a style attribute.

It also validates fine for me. What doctype are you validating against?

If I'm not overlooking something, I'd say this is fine to use as is.

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • Great - if it looks good to you, then I'll use it! Thank you! Also, I took out the validation comment from my post...it validates fine in XHTML 1.0 Strict, which is the one I needed. – Kyle Cureau Sep 05 '10 at 09:34
1

The output is XHTML-valid but the entity conversion is wrong. <img src="/test" alt="I'm ok"/> would get converted to <img src="/test" alt="I&quot;m ok">

A simple will suffice:

$allowed_tags='<font>';
echo htmlspecialchars(strip_tags(rawurldecode($input),$allowed_tags),ENT_COMPAT,'UTF-8');

but it won't convert the <font> tag to <span>.

bcosca
  • 17,371
  • 5
  • 40
  • 51
  • The entity conversion is not *wrong* as such: HTML Purifier deconstructs the whole thing and glues it back together with a new syntax. That syntax happens to use `"` instead of single quotes. I don't really see anything wrong with that. – Pekka Sep 05 '10 at 09:52
  • what you said would totally make sense. But I just tried it and I get `I'm here` which means HTML Purifier must be sensitive to the attributes. But +1 for the use case...I didn't think of that and it was definitely worth testing. That solution should be good for some who gets the `alt="I" ok"` – Kyle Cureau Sep 05 '10 at 09:55
  • @Pekka I think stillstanding was saying as that `"` would have been inappropriate in his example since a single quote would have been desired in the alt tag. – Kyle Cureau Sep 05 '10 at 09:57