1

I have a form in PHP that submits data to MySQL.

Looking at the data in the database, I can see that around 2-3% of the rows contain international characters that are encoded incorrectly, e.g. "Guðrún" displays as "Guðrún".

But, another user might submit the same characters just a few minutes later, and in that case, the characters are encoded correctly

So it seems the encoding is dependent on the computer that is used or some other factor that I am unaware of.

In the head of the HTML, I have this:

<meta charset="ISO-8859-1">

The form has this:

<form autocomplete="on" method="post" action="index.php" id="form1" accept-charset="ISO-8859-1">

The MySQL columns are set to latin1_swedish_ci.

Is there something else I should be doing to make this work for everybody?

Edit: since it was marked as duplicate I can't find an answer to this question anywhere else. I've read through lots of info on character encoding, which led me to having the setup I currently have, but that doesn't explain why 2-3% of the data is behaving differently from the rest.

Andri
  • 453
  • 4
  • 22
  • 1
    Better use UTF8 instead, imo. All it takes, is one wrong charset setting in your application - *everything* needs to be the same charset! I have previously written [**an answer about UTF-8 encoding**](https://stackoverflow.com/a/31899827/4535200) that contains a little checklist, that will cover *most* of the charset issues in a PHP/MySQL application. There's also a more in-depth topic, [**UTF-8 All the Way Through**](https://stackoverflow.com/q/279170/4535200). Most likely, you'll find a solution in either one or both of these topics. – Qirel Oct 19 '18 at 18:28
  • I had in in UTF-8 before, but I've been trying to change this back and forth because I don't only need to submit it to MySQL but also to a different database through a web service. And I had other problems with the encoding there, couldn't be in UTF-8. This ended up being the best solution, except for the 2-3% – Andri Oct 19 '18 at 19:31

1 Answers1

1

This type of error is called Mojibake. It's causes are discussed here

But... You seem to imply that some rows have Mojibake, while other rows have good accented characters? If this is the case, then it is a client error -- Some clients are using latin1, some are using utf8. It is not good to mix at this level.

However, if you do mix that way, be sure that each client announces the CHARACTER SET appropriate to its bytes. This is best done via the connection parameters, but can also be done via SET NAMES .... Here are some rambling notes on PHP

Since eth and u-acute do exist in latin1, it is possible for the table column and/or the client could be set to either latin1 or utf8mb4, you might consider moving to utf8 for future-proofing the database.

"Changing back and forth" can be dangerous -- especially if you use the 'wrong' ALTER. Please provide SELECT col, HEX(col) .... The hex for Guðrún:

if latin1:           47 75     F0    72     FA    6E
if utf8/utf8mb4:     47 75    C3B0   72    C3BA   6E
if 'double encoded': 47 75 C383 C2B0 72 C383 C2BA 6E
Rick James
  • 135,179
  • 13
  • 127
  • 222