4

I am struggling to display Japanese characters in a dataframe, which have been retrieved from a MySQL database using the RMySQL package. Japanese characters display fine if I paste a string from the database into a variable, which then has the Encoding 'UTF-8'. The Encoding of the dataframe column is 'unknown' and I haven't managed to change it using iconv.

A line of the database is:

1.00    20120801    4520000000000.00    1.00    379.00  142.00  北日本フード スーパー極上キムチ 330g

Retrieved using:

rs <- dbSendQuery(con, "select * from sales");
data <- fetch(rs, n=-1)

First row of data:

1     1 20120801 4.52e+12        1   379    142 ?????????????????????\r

Any help greatly appreciated.

Rick James
  • 135,179
  • 13
  • 127
  • 222
jgh781
  • 171
  • 2
  • 9
  • Perhaps you can `dput()` the actual R object so we can copy and paste it to re-create it and help you. Unless we are able to reproduce the problem, it's very difficult to help. Also, since we're working with encoding stuff here, please specify your operating system, default locale, and GUI version. – MrFlick Sep 03 '14 at 01:17
  • structure(list(Store = 1, Date = "20120801", JAN_code = 4.52e+12, Quantity = 1, Value = 379, Profit = 142, Product = "?????????????????????\r"), .Names = c("Store", "Date", "JAN_code", "Quantity", "Value", "Profit", "Product"), row.names = 1L, class = "data.frame"). Using a Mac, RStudio, R3.1, syslocale is 'en_GB.UTF-8 – jgh781 Sep 03 '14 at 01:29
  • Ah, so by the time the value gets to R, it has already been destroyed. If you're seeing all those "???", then the original UTF data must have been lost by that point. So the loss must be happening at the `fetch` step or before. Since I don't have access to MySQL i can't really test anything. But perhaps this question might be helpful: http://stackoverflow.com/questions/12869778/fetching-utf-8-text-from-mysql-in-r-returns – MrFlick Sep 03 '14 at 01:36
  • Thanks for the quick response. Actually I think the data has been retrieved OK as I can paste it into excel (from R) and it displays fine – jgh781 Sep 03 '14 at 01:40
  • When you annotate "First row of data" there _must_ have been code that was executed (and you did not display the code.) – IRTFM Sep 03 '14 at 02:03
  • I just ran >data[1,] – jgh781 Sep 03 '14 at 02:07
  • So you're saying you can copy the "?????????????????????\r" and paste into excel and get completely different values (ie the Japanese characters show back up)?!? What does `charToRaw(data[1,7])` return? – MrFlick Sep 03 '14 at 02:56
  • Sorry, Your'e right. It is in the retrieval. The problem seems to be fixed by adding the following line before retrieving that data: rs <- dbSendQuery(con, 'set character set "utf8"') Thanks for your help – jgh781 Sep 03 '14 at 06:18
  • @jgh781 If you've solved your own problem, please post your solution as an answer below to help others in the future and to close out the question so it doesn't appear as unanswered. – MrFlick Sep 03 '14 at 15:05

3 Answers3

11

The problem seems to be fixed by adding the following line before retrieving that data:

rs <- dbSendQuery(con, 'set character set "utf8"')
jgh781
  • 171
  • 2
  • 9
0

When trying to use utf8/utf8mb4, if you see Question Marks (regular ones, not black diamonds),

  • The bytes to be stored are not encoded as utf8. Fix this.
  • The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
  • Also, check that the connection during reading is utf8.
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Iam experiencing similar problem. How do I check whether the connection during reading is utf-8? – Pawels Mar 03 '22 at 14:31
  • @Pawels - Start a new Question and spell out more info. Are you using R? What connection parameters are you using? Is it a problem with typing utf8 characters? Or retrieving them? See https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Mar 03 '22 at 15:54
0

For me just replace RMySQL::MySQL() driver to the RMariaDB::MariaDB() solved the problem.

Thanks to this post.

user438383
  • 5,716
  • 8
  • 28
  • 43