I have a MySQL database with German "Umlaute öäü" in a table and I need to write a Go app that reads the table, encode it to ISO-8859-1 and write it to a text file.
So far so good, but the encoding to iso-8859-1 is not working. I have tried to debug this.
Here some details and information:
MySQL The MySQL database is UTF8, also the table itself. Also other character sets should be fine, except the character_set_server, but I think this is not relevant here, it should be just a default for new databases as far as I know.
When I query the database with the following SQL, I get the correct UTF8 encoded text:
select street, hex(street) from test_table where id = '36'
Result: (in real it is called Fröbelstraße)
Fröbelstraße, 4672C3B662656C73747261C39F65
So from the hex string it is basically exact what I have expected. OK.
Go App Just the relevant parts....
db, err := sql.Open("mysql", "...<connection string>...")
res, err := db.Query("select street from from test_table where id = '36'")
for res.Next() {
var pb Phonebook
err := res.Scan(&pb.Street)
fmt.Println(hex.EncodeToString([]byte(pb.Street)))
}
The output is
4672c383c2b662656c73747261c383c5b865
And that's the problem why my encoding to ISO-8859-1 is not working because the string from the database is not correct. The hex from the db direct query is correct and also working the the encoding.
But I don't understand why I get a different string from the go client.
In the original string "Fröbelstraße" are 2 characters "ö" which is C3B6 and "ß" which is C39F. The hex from the query with a db client is ok, but the one from the go app is too long, because I get a 2 bytes more per character.
When I feed my latin1 converter, with the correct hex string, it is working fine, I get an iso-8859-1 string. But not from the other one I query directly from Go.
I do this with
d := charmap.ISO8859_1.NewEncoder()
out, err := d.String(inp)
Also just a snippet, I actually call a function with a string, but I never got the correct iso8859-1 result. So I tried it with the hex code from the MySQL client query, converted back to a string, and bingo the iso8859-1 result is correct.
I also tried to query from python and get there the same strange hex from the queried string, so I am completely lost of what is wrong here. Cannot be go, because it is the same in python. But in the db it is stored correctly in my opinion and the MySQL character sets are all set the utf8mb4 or utf8, except the one I mentioned in above.