0

I have a string being entered through a WYSIWYG editor (https://summernote.org/). The problem I'm encountering is that when someone pastes text from somewhere like Google Docs it uses the apostrophe as follows:

child’s

I'm then storing that in the database (Character set: utf8mb4 Collation: utf8mb4_unicode_ci)

When displaying the data it appears with a question mark symbol on safari on Mac but appears normal on chrome on a PC.

I'm also sending an email with that text in it and it breaks the email off completely when it hits that apostrophe.

I've tried the following solutions but none seem to help me:

$str = mb_convert_encoding($str, 'UTF-8','utf8mb4');
$str = str_replace("’","'",$str);
$str = strtr($str,array("’" => "'"));

echo mb_detect_encoding($str); // yields UTF-8

My ideal method would be to convert the character to a 'regular' apostrophe before storing it in the DB.

I've seen similar questions and have tested all the answers that I've seen (including the ones that weren't selected as the accepted answer, but none have worked)

Using PHPMailer via AWS Simple Email Service

It appears the following works for replacing the apostrophe

function convert_smart_quotes($string) 
{ 
   $search = [                 
                "\xC2\xAB",     // « (U+00AB) in UTF-8
                "\xC2\xBB",     // » (U+00BB) in UTF-8
                "\xE2\x80\x98", // ‘ (U+2018) in UTF-8
                "\xE2\x80\x99", // ’ (U+2019) in UTF-8
                "\xE2\x80\x9A", // ‚ (U+201A) in UTF-8
                "\xE2\x80\x9B", // ‛ (U+201B) in UTF-8
                "\xE2\x80\x9C", // “ (U+201C) in UTF-8
                "\xE2\x80\x9D", // ” (U+201D) in UTF-8
                "\xE2\x80\x9E", // „ (U+201E) in UTF-8
                "\xE2\x80\x9F", // ‟ (U+201F) in UTF-8
                "\xE2\x80\xB9", // ‹ (U+2039) in UTF-8
                "\xE2\x80\xBA", // › (U+203A) in UTF-8
                "\xE2\x80\x93", // – (U+2013) in UTF-8
                "\xE2\x80\x94", // — (U+2014) in UTF-8
                "\xE2\x80\xA6"  // … (U+2026) in UTF-8
    ];

    $replacements = [
                "<<", 
                ">>",
                "'",
                "'",
                "'",
                "'",
                '"',
                '"',
                '"',
                '"',
                "<",
                ">",
                "-",
                "-",
                "..."
    ];

    return str_replace($search, $replacements, $string);
} 
Cary
  • 278
  • 1
  • 3
  • 17
  • what mailer are you using? – flaxon Oct 06 '21 at 01:53
  • you maight have to set the header to utf-8 `header('Content-Type: text/html; charset=utf-8'); ` https://dev.to/lutvit/how-to-make-the-php-mail-function-awesome-3cii – flaxon Oct 06 '21 at 01:55
  • @flaxon just edited question. PHPMailer via AWS. – Cary Oct 06 '21 at 02:01
  • check https://stackoverflow.com/a/2493009/12239849. by me it was a UTF-8 problem, where i had a HTML template with a wrong encoding – flaxon Oct 06 '21 at 02:04
  • Your attempts at replacing would only work, _if_ `’` in your code was actually the same byte sequence, as what you get send - which it doesn't have to be. I'd start by figuring out what bytes you actually get send in the first place, and then whether those are a valid UTF-8 sequence or not. – CBroe Oct 06 '21 at 07:22
  • It's a configuration problem; see "question mark" in https://stackoverflow.com/questions/38363566/trouble-with-utf-8-characters-what-i-see-is-not-what-i-stored – Rick James Oct 30 '21 at 03:28

0 Answers0