0

I have a CMS running. I am seeing that my CMS users are entering special characters coming from copy & paste actions in Word, etc.

So in the meta description content attribute I am seeing a diamond in place of a slanted single right-quote.

I went into the database and changed the single quote to ’.

So my CMS now renders it’s, for example.

I am trying to do conversion on the string during render using PHP's htmlentities(), htmlspecialchars_decode(), and html_entity_decode().

Each sanitizing instruction simply renders it’s.

Is there a PHP function I should use to translate? Does it even matter? That is, can I have html codes in a meta tag that is essential for SEO?

Hope this is clear...thanks for any feedback.

H. Ferrence
  • 7,906
  • 31
  • 98
  • 161

1 Answers1

1

This is the one I've put together for exactly he same reason. You could remove the strip_tag() line if you are happy to keep HTML in their posts.

function convert_to_plaintext($message)
  {
    $message = strip_tags($message);

    //Quotes: Replace smart double quotes with straight double quotes.
    //ANSI version for use with 8-bit regex engines and the Windows code page 1252.
    preg_replace('[\x84\x93\x94]', '"', $message);

    //Quotes: Replace smart single quotes and apostrophes with straight single quotes.
    //ANSI version for use with 8-bit regex engines and the Windows code page 1252.
    preg_replace("[\x82\x91\x92]", "'", $message);

    //Quotes: Replace straight apostrophes with smart apostrophes
    preg_replace("/(\b'\b)/", "?", $message);

    //Quotes: Replace straight double quotes with smart double quotes.
    //ANSI version for use with 8-bit regex engines and the Windows code page 1252.
    preg_replace('/(\B"\b([^"\x84\x93\x94\r\n]+)\b"\B)/', '?\1?', $message);

    //Quotes: Replace straight double quotes with smart double quotes.
    //Unicode version for use with Unicode regex engines.
    //preg_replace('/(\B"\b([^"\u201C\u201D\u201E\u201F\u2033\u2036\r\n]+)\b"\B)/', '?\1?', $message);
    $message = str_replace(array('%u201C','%u201D','%u201E','%u201F','%u2033','%u2036'),'"',$message);

    //Quotes: Replace straight single quotes with smart single quotes.
    //Unicode version for use with Unicode regex engines.
    //preg_replace("/(\B'\b([^'\u2018\u2019\u201A\u201B\u2032\u2035\r\n]+)\b'\B)/", "?\1?", $message);
    $message = str_replace(array('%u2018','%u2019','%u201A','%u201B','%u2032','%u2035'),"'",$message);

    //Quotes: Replace straight single quotes with smart single quotes.
    //ANSI version for use with 8-bit regex engines and the Windows code page 1252.
    preg_replace("/(\B'\b([^'\x82\x91\x92\r\n]+)\b'\B)/", "?\1?", $message);

    $message = str_replace("\n\n\n","\n\n",$message);
    $message = str_replace("\n\n\n","\n\n",$message);
    $message = str_replace("\n\n\n","\n\n",$message);
    $message = str_replace("\n\n\n","\n\n",$message);
    $message = str_replace("\n\n\n","\n\n",$message);
    $message = str_replace("\n",'<br/>',$message);

    return $message;
  }
Janis Veinbergs
  • 6,907
  • 5
  • 48
  • 78
Luc
  • 985
  • 7
  • 10
  • Sorry iPhone doesn't seem to support marking up my code properly? Looks terrible. – Luc Mar 23 '12 at 12:49
  • Those last lines doesn't look good: [Removing redundant line breaks with regular expressions](http://stackoverflow.com/questions/816085/removing-redundant-line-breaks-with-regular-expressions). See [this snippet](http://codepad.org/9sJRYbaX). – Janis Veinbergs Mar 23 '12 at 13:08