4

I am connecting to an Oracle database from R using ROracle. The problem is for every special utf-8 character it returns a question mark. Some Chinese values returns a solid string of question marks. I believe this is relevant because I haven't found any other question on this site (or others) that answers this for the package ROracle. Some questions that were the most promising include an answer for MySQL: Fetching UTF-8 text from MySQL in R returns "????" but I was unable to make this work for ROracle. This site also provided some useful information https://docs.oracle.com/cd/E17952_01/mysql-5.5-en/charset-connection.html Before I was using RODBC and was easily able to configure the uft-8 encoding.

Here is some sample code... I am sorry that unless you have an Oracle database with utf-8 characters it may be impossible to duplicate... I also changed the host number and the sid for data privacy reasons...

library(ROracle)
drv <- dbDriver("Oracle")
# Create the connection string
host <- "10.00.000.86"
port <- 1521
sid <- "f110"
connect.string <- paste(
  "(DESCRIPTION=",
  "(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",
  "(CONNECT_DATA=(SID=", sid, ")))", sep = "")

con <- dbConnect(drv, username = "XXXXXXXXX",
                 password = "xxxxxxxxx",dbname=connect.string)

my.table <- dbReadTable(con, "DASH_D_PROJECT_INFO")

my.table[40, 1:3]  

   PROJECT_ID DATE_INPUT PROJECT_NAME
    211625   2012-07-01  ??????, ?????????????????? ????? ??????, 1869?1917 [????? 3]

Any help is appreciated. I have read the entire documentation of the ROracle packages, and it seemed to have a solution for writing utf-8 characters, but not for reading them.

Hossein Vatani
  • 1,381
  • 14
  • 26
farmkid
  • 420
  • 2
  • 11

1 Answers1

6

Okay after several weeks I found my own answer. I hope that it will be of value to someone else.

My question is largely answered by how Oracle stores the data. If you want UTF-8 characteristics preserverd you need the column in the table to be an NVARCHAR not just a varchar. At that point regular data pulling and encoding will work in R as expected. I was looking for the error in the wrong place.

I also want to mention one hang up on how to write utf-8 data from R to Oracle with utf-8

In writing files I had some that would not convert to UTF-8 in the following manner. So I did the step in too parts and wrote them in two steps to an oracle table. The results worked perfectly.

Encoding(my.data1$Project.Name) <- "UTF-8"

my.data1.1 <- my.data1[Encoding(my.data1$Project.Name) == "UTF-8", ]
my.data1.2 <- my.data1[Encoding(my.data1$Project.Name) != "UTF-8", ]
attr(my.data1.1$Project.Name, "ora.encoding") <- "UTF-8"

If you found this insightful give it an up vote so more can find it.

farmkid
  • 420
  • 2
  • 11