200

After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.

Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)

The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.

Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:

  • What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
  • How can I fix it?
  • What are the likely effects of such a fix?

One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.

Brian
  • 25,523
  • 18
  • 82
  • 173
  • 3
    Post-mortem: [RichieHindle's solution](http://stackoverflow.com/a/1168099/18192) resolved the problem and did not introduce any additional problems in the time it was running. It may have been a bit of a hack, but it worked, and allowed me to avoid getting my hands dirty with 3rd party software that I don't fully understand. At this point, we've updated to a newer version of the software/schema which handles all of these encoding issues properly (and is new enough that it's actually supported), rendering the hack unnecessary. – Brian May 29 '14 at 15:47
  • 1
    All those answers didn't get to the point. This elegant solution worked for me: https://dba.stackexchange.com/a/21684/259488. You can change the encoding to `utf8mb4` for either a database or a table with only a few lines of SQL to fix the issue. – Nikolas Oct 17 '22 at 03:21

26 Answers26

162

UPDATE to the below answer:

The time the question was asked, "UTF8" in MySQL meant utf8mb3. In the meantime, utf8mb4 was added, but to my knowledge MySQLs "UTF8" was not switched to mean utf8mb4.

That means, you'd need to specifically put "utf8mb4", if you mean it (and you should use utf8mb4)

I'll keep this here instead of just editing the answer, to make clear there is still a difference when saying "UTF8"

Original

I would not suggest Richies answer, because you are screwing up the data inside the database. You would not fix your problem but try to "hide" it and not being able to perform essential database operations with the crapped data.

If you encounter this error either the data you are sending is not UTF-8 encoded, or your connection is not UTF-8. First, verify, that the data source (a file, ...) really is UTF-8.

Then, check your database connection, you should do this after connecting:

SET NAMES 'utf8mb4';
SET CHARACTER SET utf8mb4;

Next, verify that the tables where the data is stored have the utf8mb4 character set:

SELECT
  `tables`.`TABLE_NAME`,
  `collations`.`character_set_name`
FROM
  `information_schema`.`TABLES` AS `tables`,
  `information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` AS `collations`
WHERE
  `tables`.`table_schema` = DATABASE()
  AND `collations`.`collation_name` = `tables`.`table_collation`
;

Last, check your database settings:

mysql> show variables like '%colla%';
mysql> show variables like '%charac%';

If source, transport and destination are utf8mb4, your problem is gone;)

nico gawenda
  • 3,648
  • 3
  • 25
  • 38
  • Instead of `SET CHARACTER_SET utf8` use `SET CHARACTER_SET_CLIENT utf8`. At least in version 5.1 it was necessary use the latter. – Kariem Aug 29 '13 at 14:48
  • 1
    @Kariem: This is strange, because this setting is covered by the SET NAMES command, which is equivalent to calling SET character_set_client, SET character_set_results, SET character_set_connection http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html – nico gawenda Aug 30 '13 at 12:08
  • Thank you for the information. I did not know that `set names` includes all those settings. However, I got an error when calling `SET CHARACTER_SET utf8;`: `ERROR 1193 (HY000): Unknown system variable 'CHARACTER_SET'` - if i look for variables (`show variables like '%character%';`) the result list does not contain `character_set`. That's why I suggested setting the variable `character_set_client`, not knowing that it's already included in `set names`. It might be something with my configuration: I just tested on Amazon RDS and locally on a XAMPP MySQL distribution - both show that error. – Kariem Sep 05 '13 at 21:56
  • 2
    The second command should be `SET CHARACTER SET utf8` (not CHARACTER_SET) – Coder Nov 11 '13 at 18:19
  • 7
    While this answers helps to investigate the problem, it doesn't answer what to do to fix it. I see "latin1" instead of "utf-8". – Vanuan May 21 '14 at 18:20
  • to fix character_set_database, use this query: `ALTER SCHEMA myschema DEFAULT CHARACTER SET utf8`. To fix character_set_server, search for that setting in the `my.ini` file. – Kip May 20 '15 at 18:06
  • 2
    this answer is great at explaining the problem but very poor at detailing the solution (which is what OP asked for). @nicogawenda: What are all the SQL queries to be run in order to completely fix the problem? How to fix all pre-existing data? – Clint Eastwood Nov 02 '16 at 19:15
  • 2
    "If source, transport and destination are UTF-8, your problem is gone;)" that was the trick for me – therealbigpepe Jan 20 '17 at 18:52
  • 1
    I had already done extensive research into the "Incorrect string value" error message when importing a db dump from v8 into v5.7. It was not until I found this answer, specifically with the "SET NAMES" info, that I was able to import successfully. Thank you! – u628898 May 17 '23 at 03:51
89

MySQL’s utf-8 types are not actually proper utf-8 – it only uses up to three bytes per character and supports only the Basic Multilingual Plane (i.e. no Emoji, no astral plane, etc.).

If you need to store values from higher Unicode planes, you need the utf8mb4 encodings.

moeffju
  • 4,363
  • 2
  • 23
  • 22
  • 11
    I think this is likely the best fix. Upgrade to 5.5 and replace utf8 with utf8mb4 in the above answers. I was inserting utf8 data from Twitter which had emojis or other chars that needed 4 bytes. – rmarscher Jul 04 '14 at 00:38
  • Lets assume we are not going to upgrade to 5.5. How do we suppress the errors? – User Dec 28 '15 at 01:39
  • i scrolled way too far for this most useful answer – deqi Aug 10 '18 at 21:13
  • 3
    10 years since the original question. Let it be known that MySQL's utf8 encoding is not proper utf8. Use utf8mb4! Same goes for MariaDB. Otherwise you can't have tears of joy – Liam Apr 05 '19 at 15:31
  • This is the best fix. Even in 2021! – Displee Jun 10 '21 at 15:30
68

The table and fields have the wrong encoding; however, you can convert them to UTF-8.

ALTER TABLE logtest CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

ALTER TABLE logtest DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

ALTER TABLE logtest CHANGE title title VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci;
Qix - MONICA WAS MISTREATED
  • 14,451
  • 16
  • 82
  • 145
Jiayu Wang
  • 905
  • 7
  • 5
  • 3
    I think this one is the correct answer of all. I have two tables have a utf8 varchar format each. one of it got the error, the other one is fine. even I user 'update select ' make a copy from the 'good' utf8 column to another table, the same error occurs. It's because the two tables are created in different versions of MySQL. – AiShiguang Jun 27 '17 at 12:04
  • Yes! It was misconfiguration from my database table too. I think this answer should be the correct one. My problem was that the collate selected was utf8_unicode_ci instead of utf8_general_ci. Thanks :) – Juan Rivillas Jan 30 '18 at 19:13
  • 3
    What is this answer doing down here, should be at the top – Sagun Shrestha Sep 07 '18 at 19:09
  • 2
    this one helps, it tells you what to try, instead of what may be wrong. – Victor Di Jun 04 '19 at 02:51
  • 1
    Thank you! It just helped me a lot I had changed the table collation ant I thinked that should be it but the fields were still ascii collation ... – Radu Jul 18 '19 at 09:14
  • 1
    For me, just the last query was sufficient (changing the column where the user input goes), though I set it to `... CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci` instead of `utf8` because of the answer currently above this one: https://stackoverflow.com/a/24559308/1201863 – Luc Nov 27 '19 at 12:56
  • 1
    Caution: That `ALTER` fixes certain issues, but mangles others. Don't blindly use it without further discussion. – Rick James Sep 04 '20 at 02:54
  • 1
    This really solves the problem!! Thanks a lot – Grumpy Civet May 25 '21 at 07:16
43

"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:

>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data

If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.

Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?

RichieHindle
  • 272,464
  • 47
  • 358
  • 399
  • 5
    What exactly do you mean by, "Of course it's not going to understand genuine UTF-8 any more?" – Brian Jul 22 '09 at 21:28
  • 5
    @Brian: If you tell it you're giving it cp1252, and you actually give it the UTF-8 for, say, `café`, it's going to misinterpret that as `café`. It won't crash, but it will misunderstand the high-bit characters. – RichieHindle Jul 22 '09 at 21:36
  • 3
    @Richie: The database can happily call the data whatever it wants, but if the php code that grabs it is stuffing it into a string, that won't make much difference...will it? I don't see exactly where the lack of understanding of UTF-8 is having an impact. – Brian Jul 22 '09 at 21:38
  • 7
    @Brian: No, you're right. The time it would make a difference would be within the database, for instance if you used a ORDER BY clause in your SQL - the sorting would be wonky where you had non-ASCII characters. – RichieHindle Jul 22 '09 at 21:45
  • I have a similar problem querying a mysql database on windows, but not on linux. – Kimball Robinson Jan 12 '15 at 21:56
  • @RichieHindle : Well, i would be glad to change the encoding to a Windows-1252 character set, but it will be a mess in all my database (generated and managed by my application) : do you have any other solution for that problem ? (like i don't know, shut down the warning messages ?) Thanks a lot – Gaelle Jul 25 '16 at 15:28
  • 1
    @Gaelle: If the database is "generated and managed by [your] application" then you should ensure that you only feed it valid UTF-8 and you won't see this problem. There's no way to "shut down" a message that's telling you you're doing something invalid - if you promise UTF-8 but deliver something else, the database will correctly reject it. – RichieHindle Jul 25 '16 at 17:20
  • 17
    Please unmark this answer as the solution, hiding an error is not the solution of anything. Remove the overheating lamp from your car and you will see. – David Vartanian Aug 15 '17 at 13:55
  • 1
    To whoever stumble on this answer : __IT'S JUST PLAIN WRONG ON ALL POINTS__. Even the "demonstration" that the string would not be "valid utf8" is plain wrong, whatever the python version (in 2.7 you should have used a unicode string instead, and in both cases the correct operation would have been to __encode__ to utf8. – bruno desthuilliers Mar 28 '18 at 10:05
  • Unfortunately there is a lot of incorrect information out there about MySQL's non-standard utf8 encoding. Please see @moeffju's answer if you want to store utf8 text correctly. – Liam Apr 05 '19 at 15:19
31

I solved this problem today by altering the column to 'LONGBLOB' type which stores raw bytes instead of UTF-8 characters.

The only disadvantage of doing this is that you have to take care of the encoding yourself. If one client of your application uses UTF-8 encoding and another uses CP1252, you may have your emails sent with incorrect characters. To avoid this, always use the same encoding (e.g. UTF-8) across all your applications.

Refer to this page http://dev.mysql.com/doc/refman/5.0/en/blob.html for more details of the differences between TEXT/LONGTEXT and BLOB/LONGBLOB. There are also many other arguments on the web discussing these two.

frankshaka
  • 611
  • 1
  • 7
  • 12
14

First check if your default_character_set_name is utf8.

SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "DBNAME";

If the result is not utf8 you must convert your database. At first you must save a dump.

To change the character set encoding to UTF-8 for all of the tables in the specified database, type the following command at the command line. Replace DBNAME with the database name:

mysql --database=DBNAME -B -N -e "SHOW TABLES" | awk '{print "SET foreign_key_checks = 0; ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; SET foreign_key_checks = 1; "}' | mysql --database=DBNAME

To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace DBNAME with the database name:

ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;

You can now retry to to write utf8 character into your database. This solution help me when i try to upload 200000 row of csv file into my database.

Community
  • 1
  • 1
Babacar Gningue
  • 1,304
  • 1
  • 9
  • 10
9

Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different.

ALTER TABLE tabale_name MODIFY COLUMN column_name VARCHAR(255)  
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
8

In general, this happens when you insert strings to columns with incompatible encoding/collation.

I got this error when I had TRIGGERs, which inherit server's collation for some reason. And mysql's default is (at least on Ubuntu) latin-1 with swedish collation. Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:

/etc/mysql/my.cnf :

[mysqld]
character-set-server=utf8
default-character-set=utf8

And this must list all triggers with utf8-*:

select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS

And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):

show variables like 'char%';
Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277
5

That error means that either you have the string with incorrect encoding (e.g. you're trying to enter ISO-8859-1 encoded string into UTF-8 encoded column), or the column does not support the data you're trying to enter.

In practice, the latter problem is caused by MySQL UTF-8 implementation that only supports UNICODE characters that need 1-3 bytes when represented in UTF-8. See "Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC? for details. The trick is to use column type utf8mb4 instead of type utf8 which doesn't actually support all of UTF-8 despite the name. The former type is the correct type to use for all UTF-8 strings.

Mikko Rantalainen
  • 14,132
  • 10
  • 74
  • 112
5

I got a similar error (Incorrect string value: '\xD0\xBE\xDO\xB2. ...' for 'content' at row 1). I have tried to change character set of column to utf8mb4 and after that the error has changed to 'Data too long for column 'content' at row 1'.
It turned out that mysql shows me wrong error. I turned back character set of column to utf8 and changed type of the column to MEDIUMTEXT. After that the error disappeared.
I hope it helps someone.
By the way MariaDB in same case (I have tested the same INSERT there) just cut a text without error.

AVKurov
  • 156
  • 1
  • 2
  • 6
  • MySQL too I tired so many things, realized mysql doesn't support 4 byte utf-8 uncoding at this version and was dying trying to understand what's causing this. Changing the type apparently was the answer, an immediate solution. – Liza Mar 09 '17 at 15:33
3

In my case, Incorrect string value: '\xCC\x88'..., the problem was that an o-umlaut was in its decomposed state. This question-and-answer helped me understand the difference between and ö. In PHP, the fix for me was to use PHP's Normalizer library. E.g., Normalizer::normalize('o¨', Normalizer::FORM_C).

MM.
  • 1,966
  • 4
  • 20
  • 24
2

The solution for me when running into this Incorrect string value: '\xF8' for column error using scriptcase was to be sure that my database is set up for utf8 general ci and so are my field collations. Then when I do my data import of a csv file I load the csv into UE Studio then save it formatted as utf8 and Voila! It works like a charm, 29000 records in there no errors. Previously I was trying to import an excel created csv.

Yaroslav
  • 6,476
  • 10
  • 48
  • 89
mainebrain
  • 21
  • 3
2

I have tried all of the above solutions (which all bring valid points), but nothing was working for me.

Until I found that my MySQL table field mappings in C# was using an incorrect type: MySqlDbType.Blob . I changed it to MySqlDbType.Text and now I can write all the UTF8 symbols I want!

p.s. My MySQL table field is of the "LongText" type. However, when I autogenerated the field mappings using MyGeneration software, it automatically set the field type as MySqlDbType.Blob in C#.

Interestingly, I have been using the MySqlDbType.Blob type with UTF8 characters for many months with no trouble, until one day I tried writing a string with some specific characters in it.

Hope this helps someone who is struggling to find a reason for the error.

2

If you happen to process the value with some string function before saving, make sure the function can properly handle multibyte characters. String functions that cannot do that and are, say, attempting to truncate might split one of the single multibyte characters in the middle, and that can cause such string error situations.

In PHP for instance, you would need to switch from substr to mb_substr.

WoodrowShigeru
  • 1,418
  • 1
  • 18
  • 25
1

I added binary before the column name and solve the charset error.

insert into tableA values(binary stringcolname1);

Richardhe2007
  • 379
  • 2
  • 3
1

Hi i also got this error when i use my online databases from godaddy server i think it has the mysql version of 5.1 or more. but when i do from my localhost server (version 5.7) it was fine after that i created the table from local server and copied to the online server using mysql yog i think the problem is with character set

Screenshot Here

jonmrich
  • 4,233
  • 5
  • 42
  • 94
1

To fix this error I upgraded my MySQL database to utf8mb4 which supports the full Unicode character set by following this detailed tutorial. I suggest going through it carefully, because there are quite a few gotchas (e.g. the index keys can become too large due to the new encodings after which you have to modify field types).

metakermit
  • 21,267
  • 15
  • 86
  • 95
1

There's good answers in here. I'm just adding mine since I ran into the same error but it turned out to be a completely different problem. (Maybe on the surface the same, but a different root cause.)

For me the error happened for the following field:

@Column(nullable = false, columnDefinition = "VARCHAR(255)")
private URI consulUri;

This ends up being stored in the database as a binary serialization of the URI class. This didn't raise any flags with unit testing (using H2) or CI/integration testing (using MariaDB4j), it blew up in our production-like setup. (Though, once the problem was understood, it was easy enough to see the wrong value in the MariaDB4j instance; it just didn't blow up the test.) The solution was to build a custom type mapper:

package redacted;

import javax.persistence.AttributeConverter;
import java.net.URI;
import java.net.URISyntaxException;

import static java.lang.String.format;

public class UriConverter implements AttributeConverter<URI, String> {
    @Override
    public String convertToDatabaseColumn(URI attribute) {
        return attribute.toString();
    }

    @Override
    public URI convertToEntityAttribute(String field) {
        try {
            return new URI(field);
        }
        catch (URISyntaxException e) {
            throw new RuntimeException(format("could not convert database field to URI: %s", field));
        }
    }
}

Used as follows:

@Column(nullable = false, columnDefinition = "VARCHAR(255)")
@Convert(converter = UriConverter.class)
private URI consulUri;

As far as Hibernate is involved, it seems it has a bunch of provided type mappers, including for java.net.URL, but not for java.net.URI (which is what we needed here).

Sander Verhagen
  • 8,540
  • 4
  • 41
  • 63
1

In my case that problem was solved by changing Mysql column encoding to 'binary' (data type will be changed automatically to VARBINARY). Probably I will not be able to filter or search with that column, but I'm no need for that.

Kurolesik
  • 11
  • 3
0

In my case ,first i've meet a '???' in my website, then i check Mysql's character set which is latin now ,so i change it into utf-8,then i restart my project ,then i got the same error with you , then i found that i forget to change the database's charset and change into utf-8, boom,it worked.

acoder2013
  • 392
  • 4
  • 12
0

I tried almost every steps mentioned here. None worked. Downloaded mariadb. It worked. I know this is not a solution yet this might help somebody to identify the problem quickly or give a temporary solution.

Server version: 10.2.10-MariaDB - MariaDB Server
Protocol version: 10
Server charset: UTF-8 Unicode (utf8)
cherankrish
  • 2,004
  • 1
  • 16
  • 11
0

I had a table with a varbinary column that I wanted to convert to utf8mb4 varchar. Unfortunately some of the existing data was invalid UTF-8 and the ALTER query returned Incorrect string value for various rows.

I tried every suggestion I could find regarding cast / convert / char_length = length etc. but nothing in SQL detected the erroneous values, other than the ALTER query returning bad rows one by one. I would love a pure SQL solution to remove the bad values. Sadly this solution is not pretty

I ended up select *'ing the entire table into PHP, where the erroneous rows could be detected en-masse by:

if (empty(htmlspecialchars($row['whatever'])))
Dark
  • 227
  • 3
  • 9
0

The problem can also be caused by the client if the charset is not set to utf8mb4. so even if every Database, Table and Column is set to utf8mb4 you will still get an error, for instance in PyCharm.

For Python, set the charset of the connection in the MySQL Connector connect method:

mydb = mysql.connector.connect(
    host="IP or Host",
    user="<user>",
    passwd="<password>",
    database="<yourDB>",
    # set charset to utf8mb4 to support emojis
    charset='utf8mb4'
)
Moses
  • 209
  • 3
  • 7
0

I know i`m late to the ball but someone else might come accross the problem i had with this and be happy to read my workaround.

I have come accross this problem with french characters. turns out i the text I was copying had encoding the accents on some charaatcers as 2 chars and others as single chars...

i couldn`t find how to set my table to accept the strings so i ended up changing the diacritics in my text import. here is a list of them as double characters to search for them in your texts.

ùòìàè
áéíóú
ûôêâî
ç
0

For information, you can change the charset of your database but alos table per table AND column by column. I have been stuck for hours searching why my UTF-8 table couldn't take any special characters...Juste this column was still in latin1. Check it out !

-2

1 - You have to declare in your connection the propertie of enconding UTF8. http://php.net/manual/en/mysqli.set-charset.php.

2 - If you are using mysql commando line to execute a script, you have to use the flag, like: Cmd: C:\wamp64\bin\mysql\mysql5.7.14\bin\mysql.exe -h localhost -u root -P 3306 --default-character-set=utf8 omega_empresa_parametros_336 < C:\wamp64\www\PontoEletronico\PE10002Corporacao\BancoDeDadosModelo\omega_empresa_parametros.sql

Roger Gusmao
  • 3,788
  • 1
  • 20
  • 17