3

On a database I'll have to store names and such in UTF8, and hashes in latin1_bin. I called SET NAMES utf8, but I noticed that it corrupted the latin1 fields when I tried to read them (I was able to write them just fine). Which is odd, since if I understood correctly that query is only about sending data to the server, not receiving it.

phpMyAdmin displays broken data too.

Any clue about what I might be doing wrong?

(using MAMP 1.9.6)

edit: this answer specifies this is also the charset used to send data back to the client. I'm getting confused: what's the point of specifying the charset of a column if that will be ignored anyway?

edit: excerpt from the column definition:

  `tok` char(64) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
  `sal` char(16) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
...
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_roman_ci ;

excerpt from queries:

SELECT tok,sal FROM user WHERE id=4 LIMIT 1

.

INSERT INTO user (tok, sal) VALUES (x'1387ea0c22277d3000bd23241c357e3a9ba45a2e28f50581d63a73bf785a7458a95cca4de27d0a86588f5bdfa94415d6a255c2c0379ebc2f00dacba03ae6b866', x'8fca28a592c29f245ff0a3ba5f97420c')
Community
  • 1
  • 1
o0'.
  • 11,739
  • 19
  • 60
  • 87
  • How are your hash columns defined, and what do your read/write queries look like? – Jon Oct 26 '11 at 15:11
  • @Jon: updated with excerpts from the columns and the queries – o0'. Oct 26 '11 at 15:19
  • How do you treat the incoming values when reading them? This should work: `echo bin2hex(iconv("UTF-8", "ISO-8859-1", $sal))`. – Jon Oct 26 '11 at 15:38
  • @Jon: I just get empty strings if I try to do that. – o0'. Oct 26 '11 at 16:25
  • @Jon: `mb_convert_encoding` converts it to *something* instead, but still not the right value. – o0'. Oct 26 '11 at 16:36
  • Doing the other way around, i.e. removing `SET NAMES`, then converting from latin1 to utf8 when retrieving the utf8 data, kind of worked. Of course it's highly undesirable, so the problem stands. – o0'. Oct 26 '11 at 17:04

3 Answers3

4

You should change your table definition to use MySql's BINARY type, which is the perfect fit for this kind of data:

The BINARY and VARBINARY types are similar to CHAR and VARCHAR, except that they contain binary strings rather than nonbinary strings. That is, they contain byte strings rather than character strings. This means that they have no character set, and sorting and comparison are based on the numeric values of the bytes in the values.

The column definitions would become:

`tok` binary(64) NOT NULL,
`sal` binary(16) NOT NULL,
Jon
  • 428,835
  • 81
  • 738
  • 806
-1

set names ... only sets the character set of the connection (of the client side only!) for both sending and receiving the data. It is not related to the encoding of the fields in the database! It is only related to the encoding which you use as a client to the database, e.g. in PHP. The database engine will always convert the encoding between the encoding in tables to the encoding specified in set names ... query.

So in table definition you just specify encoding for each field as you need, and you don't change anything in the way how you used the set names ... command - it just remains the same as before.

Tomas
  • 57,621
  • 49
  • 238
  • 373
  • I don't get it: how can I mix different encodings then? – o0'. Oct 26 '11 at 14:56
  • @lohoris, just by specifying the encoding in the `create table` or `alter table` command. Then, during the client connection, the 2 different encodings will be converted to the connection encoding specified in `set names ...` (that implies that the connection encoding must be able to embrace all encodings used, so if you have very different encodings you might need to use utf-8 as connection encoding). – Tomas Oct 28 '11 at 13:38
  • why was this post downvoted? I use this approach many years and it works. It's part of a good culture to leave comment when downvoting. – Tomas Oct 28 '11 at 13:39
-1

Leaving SET NAMES on, and the INSERT query intact, I changed the SELECT query to:

SELECT BINARY(tok) AS tok, BINARY(sal) AS sal FROM user WHERE id=4 LIMIT 1

i.e. I casted the hash fields to binary.

While this worked, I'll leave this open in case someone provides a (perhaps more correct?) alternative.

o0'.
  • 11,739
  • 19
  • 60
  • 87
  • 1
    I thought that `sal binary(16) NOT NULL` on the table definition would be more correct but that it wouldn't make a difference here. Can you check if it does? If it solves the problem, that's IMHO "the" correct solution. – Jon Oct 27 '11 at 01:30
  • @Jon: yeah, that worked, thanks :) (would you post it as an answer so I can accept it?) – o0'. Oct 28 '11 at 07:31