0

I have some documents, formatted in XML. I want to store their contents (raw text, formatting preserved) in cells in an SQL table, as LONGTEXT, so that I can simply grab the value of a cell and load it in a webpage later. I am doing this via MySQL Workbench.

However, when I try to apply the additions to my table, I get error 1366: Incorrect string value: \xE2\x80\xAF1, ...

I tried changing the character set to utf-8-general-ci and cp1251, but I keep getting the same errors.

Also, I searched the XML file for the string \xE2\x80\xAF1, but it's not even in the file.

Does anybody know what this string is?

The XML file is only 219KB so I think it should (very) easily fit in a LONGTEXT entry.

Does XML make use of any characters that could cause this error?

Am I missing another cause of the error?

nyedidikeke
  • 6,899
  • 7
  • 44
  • 59
  • 1
    You probably won't see literal `\xE2` in your file. That's a text representation of a binary value. Is your XML UTF-8 clean? Is your column *and* connection UTF-8? – tadman Sep 09 '16 at 01:57
  • Possible duplicate; consider checking this [answer](http://stackoverflow.com/a/1168099/6381711) for a quick resolution. – nyedidikeke Sep 09 '16 at 02:00
  • Possible duplicate of [How to fix "Incorrect string value" errors?](http://stackoverflow.com/questions/1168036/how-to-fix-incorrect-string-value-errors) – nyedidikeke Sep 09 '16 at 02:01

2 Answers2

0

Your code is not a literal text but points to NARROW NO-BREAK SPACE.

With UTF-8 simple characters are coded as one byte, other characters need two bytes.

And there are some characters which need three bytes. These characters tend to lead to such errors.

Find a related question here: freebcp: "Unicode data is odd byte size for column. Should be even byte size"

Community
  • 1
  • 1
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • The link is irrelevant since it is talking about SQL Server, not MySQL. As you say, `E280AF` is a _utf8_ character. _Unicode_ would encode it as hex `202F`. Also, utf8 _characters_ can be a big as 4 _bytes_. In MySQL, that requires `CHARACTER SET utf8mb4`. Most of Asia needs 3-byte utf8; Chinese needs some 4-byte characters. – Rick James Sep 09 '16 at 22:35
0

You need to

  • specify utf8 (or utf8mb4) for the connection established from your client (Workbench).
  • declare the column in question to be CHARACTER SET utf8 (or utf8mb4).
Rick James
  • 135,179
  • 13
  • 127
  • 222