0

I have a strange issue. I use CKEditor-4 to collect formatted text from user in form of html. Also, the html content is filtered using htmlpurifier from the server.

When the user use quotes like , and CKEditor converts them into html entities like ”, ’, and “, which is fine. The issue is, when I filter them using htmlpurifier - this quotes get's automatically decoded. This prevents the content from: being presented to user for later edit as the quotes are literally encoded in strage ways like “

How do i fix this? I think, if I could stop htmlpurifier from automatically decoding things, this would work, But I am new to htmlpurifier - so I can't find a way.

I have tried using htmlentities before passing it to htmlpurifier. but it would encode the whole html, Hence: stopping htmlpurifier from purifying html at all.

Mohamed Mufeed
  • 1,290
  • 6
  • 12

1 Answers1

1

After CBroe's comment, I found out that my application is not using UTF-8 all the way through.

And I can't rectify it also. For those who are in similar situation, I found a work-around. htmlPurifier does support a configuration to encode all non-ASCII charecters with some trade-offs - It's fine with my case(I think).

you can enable the htmlpurifier config Core.EscapeNonASCIICharacters like so

$config->set('Core.EscapeNonASCIICharacters', true);

which did the trick for me.


This is the full function

/**
 * Purifies dirty html
 *
 * @param string $dirty_html
 * @return string
 */
function purifyHtml($dirty_html)
{
    $config = HTMLPurifier_Config::createDefault();
    $config->set('Core.Encoding', 'UTF-8');
    $config->set('Core.EscapeNonASCIICharacters', true);
    $config->set('HTML.Doctype', 'HTML 4.01 Transitional');
    $config->set('Cache.SerializerPath', getStoragePath('cache/html-purifier'));

    $htmlPurifier = new HTMLPurifier($config);
    return $htmlPurifier->purify($dirty_html);
}
Mohamed Mufeed
  • 1,290
  • 6
  • 12
  • 1
    Glad you figured it out - and thanks for sharing your solution. In case you don't know already, you should know that you can accept your own answer in two days, to mark this question as resolved :) https://stackoverflow.blog/2009/01/06/accept-your-own-answers/ – pinkgothic Dec 15 '20 at 12:29
  • Ah I see. Thanks – Mohamed Mufeed Dec 15 '20 at 12:32
  • @MohamedMufeed I ran into the same issue and tried your solution. It's not working for me. I wonder if it's my config. How did you set up your config? I did ```$config = \HTMLPurifier_Config::createDefault();```. If you can post your code that would be awesome. – Fly_Moe Mar 25 '22 at 15:01
  • @Fly_Moe updated. I must say it's been too long. So I don't remember much of the thing that I used to know about the configuration. – Mohamed Mufeed Mar 26 '22 at 08:21