1

In a webapp I place a <div id="xxx" contentEditable=true > for editing purpose. The encodeURIComponent(xxx.innerHTML) will be send via Ajax POST type to a server, where a PHP script creates a simple txt file from it which in turn can be downloaded from the user to store it locally or print it on screen. It works perfect so far, but … Yes, but, character encoding is a mess. All special characters like the german Ä are interpretated wrong. In this case as ä I google for some days and I study PHP methods like iconv() and I know how to set up a browsers character encoding and also set a text editor for a correct correspondending decoding. But nothing helps, its still a messs, or becoming even weired.

So my question is : Where in this encoding/decoding roundtrip from the browser to a server and back to the browser I have to do what, to ensure that an Ä will still be an Ä ?

Ben
  • 677
  • 5
  • 19

2 Answers2

2

I answer my question, because it turns out to be another problem as stated above. The contenteditable is actually part of a section of html code. On the serverside with PHP I need to filter out the contenteditable text which I do via a DOMDocument like this:

$doc = new DOMDocument();
$doc->loadHTML($_POST["data"]);

then I access the elements and their textual content as usual. Finally I save the text with

file_put_contents($txtFile,  $plainText, LOCK_EX);

The saved text then was a mess as written above. Now it turns out that you need to tell the DOMDocument the character set wich loadHTML() has to interpretate. In this case UTF-8. First I did it as recommended in PHP this way :

$doc = new DOMDocument('1.0', 'UTF-8');

But that doesn't help (I wonder). Then I found this answer in SO. And the final solution is this :

$doc->loadHTML('<?xml encoding="UTF-8">' . $_POST["data"]);

Though it works it is a trick. Finally the question is left over, how to do it the right way ? If somebedoy has the definite answer, he is very welcome.

Community
  • 1
  • 1
Ben
  • 677
  • 5
  • 19
1

You need to make sure that the content is encoded consistently throughout its roundtrip from user input to server-side storage and back to the browser again.

I would recommend using UTF-8. Check that your HTML document (which includes the contenteditable zone) is UTF-8 encoded, and that the XMLHttpRequest/Ajax request does not specify a different encoding when it sends the content to the server.

Check that your server-side application encodes the text file as UTF-8 also. And check that the HTTP response headers declare the file's encoding as UTF-8 when the file is requested and downloaded in the browser.

Somewhere along this path, the encoding differs, and that is what is causing the error. iconv converts between different encodings, which should not be necessary if everything is consistent.

Good luck!

kieranpotts
  • 1,510
  • 11
  • 8
  • thanks for your hints. The HTML document which inherits the `conteneditable` is declared as ``. Can this be overwritten ? XMLHttpRequest/Ajax request has no specific encoding, except 'encodeURIComponent'. So UTF-8 should walk through. serverside I have no special encoding/decoding setup. How to do that ? Where to declare HTTP response headers for downlaoding ? – Ben Jul 18 '15 at 11:47
  • 1
    The `Content-Type` HTTP header in the response will override any character encoding declared in the HTML document itself, e.g. using the `` tag. Check in your console for a `Content-Type` header value of "text/html; charset=utf-8" for the web page and "text/plain; charset=utf-8" for the file download. If these are wrong, correct them in your server-side application with PHP's `header()` function: http://php.net/header :) – kieranpotts Jul 18 '15 at 12:07