1

So my R uses cp1250 charset, sessionInfo() output:

R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Czech_Czech Republic.1250  LC_CTYPE=Czech_Czech Republic.1250    LC_MONETARY=Czech_Czech Republic.1250
[4] LC_NUMERIC=C                          LC_TIME=Czech_Czech Republic.1250

Now I want to work with MySQL database using dbplyr package. At first, when I connect to DB, I send the following mysql query:

SET NAMES 'cp1250';

Then when I send SELECT statement like this:

SELECT dg_group
FROM transpl
WHERE `dg_group` = 'Hodgkinův lymfom'

it returns me 0 rows. BUT! When I set character encoding of string 'Hodgkinův lymfom' to UTF-8, it returns me all the relevant rows. I set character encoding to UTF-8 like this:

x <- 'Hodgkinův lymfom'
Encoding(x) <- 'UTF-8'

Then the SELECT statement looks like this when I place variable x into the WHERE clause:

SELECT dg_group
FROM transpl
WHERE `dg_group` = 'Hodgkin<f9>v lymfom'

Although the transactions' encoding is cp1250, it is working with UTF-8 but not with cp1250.

By the way, when I make the following SELECT statement with SET NAMES 'cp1250', returned values in the rows are correctly displayed:

SELECT *
FROM transpl

Any idea on what can be wrong?

scarface
  • 574
  • 6
  • 20

1 Answers1

2

These charsets treat hex F9 as ů: cp1250, cp1256, dec8, latin1, latin2, latin5. For utf8/utf8mb4, it is hex C3B9

SET NAMES announces the charset of the client.

But what about the charset of the column you are storing into? SHOW CREATE TABLE to find out.

When doing SELECT col, HEX(col) ... do you get F9 or C3B9?

More discussion: Trouble with UTF-8 characters; what I see is not what I stored

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • At that time when I was writing my question, I did not really understand what is going on and why the described situation occurred instead of the "expected" one. I did not deal with understanding this thing at that time anymore. However, as I went through different experience in this field, I came across my question again and now I understand what is going on and so I marked your answer as the correct one. Thank you. – scarface Jun 18 '19 at 08:21