2

I am trying to query data from a mysql database, which contains some strings, of course. For the connection and data retrieval I am using RMySQL in R, which works fine. Apart from one thing: the strings I am retrieving seem not to be in utf8. But I need this, because I have some german "Umlaute" in these strings. When I ask teh databse, which are its encoding by

dbGetQuery(db, "SHOW VARIABLES LIKE 'character_set_%';")

I get the desired answer:

             Variable_name           Value
1   character_set_client             utf8
2   character_set_connection         utf8
3   character_set_database           utf8
4   character_set_filesystem         binary
5    character_set_results           utf8
6     character_set_server           utf8
7     character_set_system           utf8
8       character_sets_dir C:\\Program Files\\MySQL\\MySQL Server 5.7\\share\\charsets\\

But e.g. I receive

Andreas Wünsche

instead of

Andreas Wünsche

Hope that somebody knows how to deal with it. If additonal information is needed, just ask. I can provide it.

stephan mc
  • 65
  • 1
  • 5

3 Answers3

3

I find something a bit tricky but works for me :

you have to manually define the col of your data frame to utf-8 like this :

x <- "Wünsche"
Encoding(x) <- "UTF-8"
x
[1] "Wünsche"

Think you have to do this to all your strings vector

EDIT :

Take a look here
seems to fix the same problem by adding 'set character set "utf8"'inside the dbSendQuery()

Community
  • 1
  • 1
Christophe D.
  • 1,089
  • 11
  • 21
2

I took this answer from: https://stat.ethz.ch/pipermail/r-sig-db/2012q1/001141.html Before dbSendQuery you have to place dbGetQuery(mydb, "SET NAMES 'utf8'")

mydb <-  dbConnect(MySQL(), user = db_user, password = db_password,
               dbname = db_name, host = db_host, port = db_port)

s=dbGetQuery(mydb, "SET NAMES 'utf8'") 
s=paste0("select * from ", db_table) 
rs=dbSendQuery(mydb, s)
df=fetch(rs, n = -1)
baitmbarek
  • 2,440
  • 4
  • 18
  • 26
Joao Fonseca
  • 51
  • 2
  • 4
0

When trying to use utf8/utf8mb4, if you see Mojibake, check the following. This discussion also applies to Double Encoding, which is not necessarily visible.

  • The bytes to be stored need to be utf8-encoded.
  • The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4.
  • The column needs to be declared CHARACTER SET utf8 (or utf8mb4).
  • HTML should start with <meta charset=UTF-8>.
Rick James
  • 135,179
  • 13
  • 127
  • 222