1

Whilst doing some checking for a client to see if their site was still functioning well I found a random page that contained a bunch of weird characters like ¿½.

I think this has to do with the tables having a latin1 encoding instead of utf-8. But seeing as no other pages are affected that use the same table could there be another error. I did check if the text itself was safe and making sure it was just clean text.

So I have 2 questions, the main one being is it safe to just update this one table to utf8 and if not what causes this error and wondered why would this only affects one certain page. (Side note the website is built using typo3)
Ofcourse I have live example's the links are:
Site 1: With weird text characters
Site 2: Same table, but no weird characters

RDAxRoadkill
  • 414
  • 5
  • 24
  • 1
    3 steps: 1) Backup 2) Convert 3) Try/check/see if it's correct, if it failed -> Restore backup, if it succeeds -> Profit! Even shorter answer: it will **most probably** work. Also see [this answer](http://stackoverflow.com/a/6184788/215042) – RobIII Sep 30 '15 at 14:01
  • I suppose this would be a safe way to do so, yet I would not understand why page 1 has this error and page 2 does not whilst both using the same table so I would rather like to understand the error first before indeed doing the 3 steps wich you just said. – RDAxRoadkill Sep 30 '15 at 14:03
  • Maybe the first content was submitted with a different charset than the other? Maybe the first content was "imported" or got in the database in a different way than the second content? Also make sure the connection uses `SET NAMES utf8` or [mysql_set_charset](http://stackoverflow.com/a/1650834/215042) – RobIII Sep 30 '15 at 14:05
  • This could be the case, I have made sure to check if the content was imported but it does not seem this way, both have been typed into typo's backend text editor. But these are valid points and there is no way to check if they were submitted with a different charset I assume? – RDAxRoadkill Sep 30 '15 at 14:16

1 Answers1

1

Ultimately the client connecting to the database decides how their encodings are handled; that's known as the connection encoding. Whatever encoding the text is stored as in MySQL, it will be converted on the fly to/from the client's connection encoding. As such, just changing the underlying column's storage to utf8 doesn't affect anything.

However, that in itself also won't "fix" anything. The characters will still be garbage. You'll also have to convert the actual characters to the correct data. Otherwise you'll just have "¿½" stored encoded as utf8 instead of "¿½" stored encoded as latin1. And changing those characters will likely affect any other client which has been doing it wrong so far, so the client side needs to be fixed at the same time.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • Well in the backend of typo3 everything is just the same and there are no weird text errors. So basically to fix this issue, i'd have to update the tables to `utf8` but whilst doing that also update the client's connection encoding? – RDAxRoadkill Sep 30 '15 at 14:14
  • The roundtrip to Typo3 works, but the data is screwed up in the database. See [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) for an explanation. Just changing the column's encoding to `utf8` won't actually do anything of significance. To really fix your data, you'll need to convert it from garbage to the actual characters it's supposed to be. And when doing that you'll also need to fix the client, which likely has the wrong connection encoding set, which caused the garbage in the first place. – deceze Sep 30 '15 at 14:17
  • I see, how would I go about changing the charset for the client? I have not done such a thing before – RDAxRoadkill Sep 30 '15 at 14:28
  • Depends on how the client connects. PDO example: `new PDO('mysql:...;charset=utf8mb4')` – deceze Sep 30 '15 at 14:29
  • I see, I shall have a search around for that, but I think I have enough to try for now. Thank you for your help, could I send you a message etc. If I have any more questions? – RDAxRoadkill Sep 30 '15 at 14:32
  • 1
    @RDAxRoadkill: If you have more questions the, [for the love of God, keep it public](https://xkcd.com/979/)! – RobIII Sep 30 '15 at 20:01