1

I need to save the "plain" version of HTML content coming from a textarea with a WYSIWYG editor. Right now I'm using the following function right before saving into the database:

public function preUpdate(PreUpdateEventArgs $event)
{
    if (($resource = $event->getEntity()) instanceof Resource) {
        $resource->setPlainContent($this->computePlainContent($resource));
    }
}

protected function computePlainContent(Resource $resource)
{
    return preg_replace(
        '/\s+/',
        ' ',
        html_entity_decode(
            strip_tags($resource->getContent()),
            ENT_QUOTES | ENT_HTML401
        )
    );
}

Plain text will be used for searching among pages.

Questions:

  • is this good/safe**, assuming the editor will always produce valid HTML?
  • would you remove punctuation mark, and how?
  • should I use ENT_HTML401 or ENT_XHTML with CKEditor (default configuration, don't know the output quality)?

** for safe I mean safe to produce a good output. Users (o this system) are trusted.

gremo
  • 47,186
  • 75
  • 257
  • 421
  • 1
    You should always check server side the user inputs. – Fez Vrasta Nov 10 '13 at 00:34
  • @FezVrasta are you talking about content or plainContent? By the way our users are trusted. – gremo Nov 10 '13 at 00:36
  • 2
    Of course it's not safe lol. You're transforming things like `>` into `>` using html_entity_decode() – nice ass Nov 10 '13 at 00:36
  • @onetrickpony I'm stripping tags right before decoding... – gremo Nov 10 '13 at 00:37
  • 1
    `<script>` is not a tag, you're making it a tag – nice ass Nov 10 '13 at 00:37
  • @onetrickpony I understand now. I've updated the question, "safe" for good output point of view. My bad. – gremo Nov 10 '13 at 00:39
  • possible duplicate: http://stackoverflow.com/questions/1884550/converting-html-to-plain-text-in-php-for-e-mail – bitWorking Nov 10 '13 at 02:33
  • Just to add my two bits: You can also use jquery to get the plaintext. As for the > etc, that can cause real problems if your data is not 100% managed by CKE, take a look for example at http://jsfiddle.net/qdndP/129/ - check the source HTML, then check what is displayed in the Editor and then check the editor source mode. As you see, CKE has some pretty cool logic in it to crunch tags, but they are expected to follow a good rule set. If that content was written in CKE (even in source mode!), it would be fine. Is this done for indexing / search reasons? – Joel Peltonen Nov 11 '13 at 07:38

0 Answers0