1

What is causing this incorrect string? I have read many questions and answers and here are my results. I still am getting same error after reading answers.

I am getting the following error: ERROR 1366 (HY000) at line 34373: Incorrect string value: '\xEF\xBB\xBF<?x...' for column 'change' at row 1

When I try to enter the following into SQL: Line number 34373: INSERT INTO gitlog_changes VALUES ('123456', 'NhincCommonEntity.xsd', '<?xml version=\"1.0\" encoding=\"UTF-8\"?>');

My table looks like this:

DROP TABLE IF EXISTS `gitlog_changes`;
CREATE TABLE `gitlog_changes` (
  `hashID` varchar(40) NOT NULL,
  `filename` varchar(450) DEFAULT NULL,
  `change` mediumtext
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I read many answers that say change the charset to UTF8 [1][2][3][4]. So I execute this: alter table yourTableName DEFAULT CHARACTER SET utf8;

I continue to get the same error. Then I alter table yourTableName DEFAULT CHARACTER SET utf8mb4_general_ci;

Still same error occurs.

I also attempt to read in a file from python and do direct commits to the database. From this answer[1]. I get a warning instead of an error.

I insert the following code into my python script:

    cursor.execute("SET NAMES 'utf8'")
    cursor.execute("SET CHARACTER SET utf8")

Python script:

def insert_changes(modList):
    db = MySQLdb.connect("localhost", "user", "password", "table")
    cursor = db.cursor()

    cursor.execute("SET NAMES 'utf8'")
    cursor.execute("SET CHARACTER SET utf8")

    for mod in modList:
        hashID = mod["hashID"]
        fileName = mod["fileName"]
        change = mod["change"]

        cursor.execute("INSERT INTO gitlog_changes VALUES (%s, %s, %s" , (hashID, fileName, change))
    # # disconnect from server
    db.commit()
    db.close()

The warning I get here is: Warning: Invalid utf8 character string: '\xEF\xBB\xBF<?x...' cursor.execute("INSERT INTO gitlog_changes VALUES (%s, %s, %s)" , (hashID, fileName, change))

Community
  • 1
  • 1
Whitecat
  • 3,882
  • 7
  • 48
  • 78

2 Answers2

2

Text you're trying to insert contains UTF-8 BOM in the beginning (that's the \xEF\xBB\xBF in your error).

Please check this answer to see how to convert from UTF-8 with BOM into UTF-8.


As stated in MySQL docs

MySQL uses no BOM for UTF-8 values.

So the only solution is decoding this string in your python code.

Community
  • 1
  • 1
Outshined
  • 709
  • 7
  • 22
  • What do I change in the database? – Whitecat Oct 04 '16 at 21:39
  • @Whitecat As I mentioned in edit, the only way is to decode the string inside your python code – Outshined Oct 04 '16 at 21:44
  • Correct me if I am wrong. All strings need to be edited to put them into SQL. I even have to edit the insert statements from the file if I want to use the file. – Whitecat Oct 04 '16 at 21:48
  • The reason I avoided decode and encode is this answer: http://stackoverflow.com/questions/1168036/how-to-fix-incorrect-string-value-errors says it is a bad idea. – Whitecat Oct 04 '16 at 21:49
  • Arg. I get the error `'utf8' codec can't decode byte 0xe9 in position 77: invalid continuation byte` – Whitecat Oct 04 '16 at 21:57
  • @Whitecat Ok, I added a link to an answer how to read a string in utf-8 bom encoding, this should remove the bom bytes from the string. I didn't know it differs that much from reading utf-16 bom :) – Outshined Oct 04 '16 at 22:22
2

The string you're trying to insert into db has an unusual character at its beginning. I just copied your string:

In [1]: a = '<'

In [2]: a
Out[2]: '\xef\xbb\xbf<'

You need to get rid of those characters. This is a good post explaining what these characters are.

Community
  • 1
  • 1
afsafzal
  • 592
  • 5
  • 15