0

Here's my situation.

I'm migrating from one server to another. As part of this, I'm moving across database data.

The migration method involved running the same CREATE TABLE query on the new server, then using a series of INSERT commands to insert the data row by row. It's possible this resulted in different data, however, the CHECKSUM command was used to validate the contents. CHECKSUM was done on the whole table after the transfer, on a new table with that row isolated, and after truncation of the string by applying the LEFT operator. Every time, the result was identical between the old and new server, indicating the raw data should be exactly identical at the byte level.

CHECKSUM TABLE `test`

I've checked the structure and it's exactly the same as well.

SHOW CREATE TABLE `test`

Here is the structure:

CREATE TABLE test ( item varchar(32) COLLATE utf8_unicode_ci NOT NULL, amount mediumint(5) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

The field is of type:

`item` varchar(32) COLLATE utf8_unicode_ci NOT NULL

Here is my connection code in PHP:

$sql = new mysqli($db_host, $db_user, $db_pass, $db_name);
if ($sql->connect_error) {
  die('Connect Error ('.$sql->connect_errno.') '.$sql->connect_error);
}

When I go to retrieve the data in PHP with a simple query:

SELECT * FROM `test`

The data displays like this:

§lO

On the old server/host, I get this sequence of raw bytes:

Decimal: -194-167-108-79-
HEX: -C2-A7-6C-4F-

And on the new server, I get a couple of extra bytes at the beginning:

Decimal: -195-130-194-167-108-79-
HEX: -C3-82-C2-A7-6C-4F-

Why might the exact same raw data, table structure, and query, return a different result between the two servers? What should I do to ensure that results are as consistent as possible in the future?

azoundria
  • 940
  • 1
  • 8
  • 24
  • check the charset on the databases, just run on both ` SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = "database_name"` – Alberto Sinigaglia Nov 16 '19 at 22:31
  • Thanks for your help. In both cases the result I receive is "DEFAULT_CHARACTER_SET_NAME" being "utf8". – azoundria Nov 16 '19 at 23:39
  • can you please post the create table of that table? – Alberto Sinigaglia Nov 16 '19 at 23:41
  • In both cases "CREATE TABLE `test` ( `item` varchar(32) COLLATE utf8_unicode_ci NOT NULL, `amount` mediumint(5) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci" – azoundria Nov 16 '19 at 23:42
  • how are you sure that the data in one table is the same as the other one? – Alberto Sinigaglia Nov 16 '19 at 23:46
  • Please provide it in hex. – Rick James Nov 17 '19 at 05:43
  • @AlbertoSinigaglia - The `DEFAULT` is irrelevant. You need the `CHARACTER SET` for the _column_. Easy way: `SHOW CREATE TABLE`. – Rick James Nov 17 '19 at 05:52
  • 1
    `latin1` was probably involved somewhere, such as the _connection_ parameters. – Rick James Nov 17 '19 at 05:52
  • @AlbertoSinigaglia As I stated, this is based on a checksum taken on the table in both servers. As a check against an unlikely collision scenario, I also repeated with removing some characters from both, and still get the same checksum result. – azoundria Nov 18 '19 at 17:38
  • @Rick James I've added hex translations of the raw bytes to my question. The `SHOW CREATE TABLE` is also posted above. The MYSQLI connection code is the same in both cases, and I included it. Thanks so much for your help. – azoundria Nov 18 '19 at 17:42

1 Answers1

1

§lO is "Mojibake" for §lO. I presume the latter (3-character) is "correct"?

The raw data looks like this (in both cases when I display it)

is bogus because the technique used for displaying it probably messed with the encoding.

Since the 3 characters became 4 and then became 6, you probably have "double-encoding".

This discusses how "double encoding" can occur: Trouble with UTF-8 characters; what I see is not what I stored

If you provide some more info (CREATE TABLE, hex, method of migrating the data, etc), we may be able to further unravel the mess you have.

More

When using mysqli, do $sql->set_charset('utf8');

(The HEX confirms my analysis.)

The migration method involved running the same CREATE TABLE query on the new server

Was it preceded by some character set settings, as in mysqldump?

then using a series of INSERT commands to insert the data row by row.

Can you get the HEX of some accented character in the file?

... CHECKSUM ...

OK, being the same rules out one thing.

CHECKSUM was done on ... a new table with that row isolated

How did you do that? SELECTing the row could have modified the text, thereby invalidating the test.

indicating the raw data should be exactly identical at the byte level.

For checking the data in the table, SELECT HEX(col)... is the only way to bypass all possible character set conversions that could happen. Please provide the HEX for some column with a non-ascii character (such as the example given). And do the CHECKSUM against the HEX output.

And provide SHOW VARIABLES LIKE 'char%';

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thank you very much for your help. I have added all the details you requested to the question. I am currently reading the other question you have provided. I appreciate your help. Please let me know if any other details would be useful. – azoundria Nov 18 '19 at 17:52
  • @azoundria - I added More. – Rick James Nov 18 '19 at 20:02
  • I want to thank you so much for your attempt at help. What happened was I simply timed out in the migration and the old web host account is now offline so I have no way to know the discrepancy. I had nothing more to say but thanks. – azoundria Jan 17 '20 at 05:40