0

I've tried finding an answer to this, and I have the sense it has to do with encoding, but no success after many attempts made me think to ask here. Sorry if it's a duplicate.

I have a textarea input field being stored in a MySQL text column. When the text is entered directly into the textarea, it stores and displays fine. When it's typed into Word and then copy-and-pasted (something which my users insist on doing), it turns apostrophes and double-quotes into this:

' is displayed as ’
" is displayed as “
" is displayed as â€

However, I'm sure it's an encoding problem on the way out, because if I issue my select statements from command line, it displays fine. It's only if I look at the data via web (phpMyAdmin or via my actual application) that it garbles.

I tried this:

$output = str_replace("’","'",$input);

and

$output = str_replace("\â\€\™","\'",$input);

etc. But no effect. I downloaded Encoding::toUTF8 (mentioned in Detect encoding and make everything UTF-8), and it manages to replace the problematic strings, but indiscriminantly into question marks, instead of the originals.

I kind of feel like I'm poking around in the dark, and would appreciate any pointers!

Community
  • 1
  • 1
yycroman
  • 7,511
  • 1
  • 19
  • 21
  • http://www.joelonsoftware.com/articles/Unicode.html and http://stackoverflow.com/questions/175785/how-do-i-convert-word-smart-quotes-and-em-dashes-in-a-string – Cyclonecode Dec 01 '14 at 06:37
  • 1
    Question http://stackoverflow.com/questions/175785/how-do-i-convert-word-smart-quotes-and-em-dashes-in-a-string is related but not the same: it specifically asks how to convert smart quotes to "normal" (i.e. Ascii) quotes and em dashes to “regular dashes” (i.e. Ascii hyphens); and the accepted answer is just a reference to a general tutorial. It is difficult to see what was really asked there, and what is really being asked here. Should be “smart” characters be handled properly, or turned to dull Ascii? How does the problem really originate? – Jukka K. Korpela Dec 01 '14 at 07:33

1 Answers1

0

Actually the problem is not happening in PHP but it is happening in JavaScript, it is due to copy/paste from Word, so you need to solve your problem in JavaScript before you pass your text to PHP:

// Replaces commonly-used Windows 1252 encoded chars that do not exist in ASCII or ISO-8859-1 with ISO-8859-1 cognates.
var replaceWordChars = function(text) {
    var s = text;
    // smart single quotes and apostrophe
    s = s.replace(/[\u2018|\u2019|\u201A]/g, "\'");
    // smart double quotes
    s = s.replace(/[\u201C|\u201D|\u201E]/g, "\"");
    // ellipsis
    s = s.replace(/\u2026/g, "...");
    // dashes
    s = s.replace(/[\u2013|\u2014]/g, "-");
    // circumflex
    s = s.replace(/\u02C6/g, "^");
    // open angle bracket
    s = s.replace(/\u2039/g, "<");
    // close angle bracket
    s = s.replace(/\u203A/g, ">");
    // spaces
    s = s.replace(/[\u02DC|\u00A0]/g, " ");

    return s;
}

//Use like:
var newText = replaceWordChars(textToCheck);

From: https://stackoverflow.com/a/6219023/1857295 .

Community
  • 1
  • 1
Billel Hacaine
  • 157
  • 1
  • 11