2

I am using the RODBC package to read data from SQL server. R is reading the Chinese characters as "?????" I have passed the parameter DBMSencoding = "UTF-8" to the odbcConnect function.

Following is the sample code I am using:

Connection <- odbcConnect("abc", uid = "123", pwd = "123", 
                          DBMSencoding = "UTF-8", readOnlyOptimize=T)

Var1 <- sqlQuery(Connection, query, errors = TRUE, stringsAsFactors=F)

May be I didn't pass the arguments the way I am supposed to?

sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RODBC_1.3-12

loaded via a namespace (and not attached):
[1] tools_3.2.3

odbcGetInfo(mainConnection)
         DBMS_Name               DBMS_Ver        Driver_ODBC_Ver      Data_Source_Name            Driver_Name 
        "Microsoft SQL Server"  "10.50.4000"     "03.52"                                         "SQLSRV32.DLL" 
        Driver_Ver               ODBC_Ver            Server_Name 
      "06.01.7601"           "03.80.0000"            
Hack-R
  • 22,422
  • 14
  • 75
  • 131
Sumedh
  • 4,835
  • 2
  • 17
  • 32
  • Did you try without specifying `DBMSencoding`? Unicode might be default. – pneumatics Jan 25 '16 at 16:47
  • Yes, I did. I included the parameter only when the default wasn't working. – Sumedh Jan 25 '16 at 16:52
  • Does `DBMSEncoding="UCS-2"` do any better? – pneumatics Jan 25 '16 at 17:34
  • Nope. The data doesn't get read, at all. It gives this error: Error in iconv(query, to = enc) : embedded nul in string: 'þÿ\0S\0E\0L\0E\0C\0T\0 – Sumedh Jan 25 '16 at 17:43
  • 1
    Correction: When I don't specify DBMSencoding - only the chinese characters are not read. – Sumedh Jan 25 '16 at 17:45
  • Ok. No `DBMSEncoding`, French ok, Chinese broken. More information about your setup will be helpful. Please share the output of `sessionInfo()`, and would you please share the output of `odbcGetInfo(connection)`, and the data types of the text columns from `sqlColumns(connection)`. – pneumatics Jan 25 '16 at 17:54
  • @pneumatics: Kindly find attached the link in the beginning of the post – Sumedh Jan 25 '16 at 18:18
  • The column in question is "Stringvalue" – Sumedh Jan 25 '16 at 18:20
  • Thanks for providing that information. Editing the question to add it would be helpful, so others don't need to click a link, and for posterity so that people coming by later can know the situation. Is `StringValue` the only column with non-English words in it? – pneumatics Jan 25 '16 at 18:33
  • Yes, StringValue is the only column with non-english words in it. – Sumedh Jan 25 '16 at 18:46
  • Are you having this problem in RStudio? – pneumatics Jan 25 '16 at 19:08
  • Yes, I am using RStudio. – Sumedh Jan 25 '16 at 19:13
  • Would you please try to replicate the problem using an R shell? You can open one with the Tools --> Shell ... menu. – pneumatics Jan 25 '16 at 19:16
  • I get the same error when running through R console. – Sumedh Jan 25 '16 at 19:37
  • Did you prefix the "StringValue" field with `N` when it was inserted? – pneumatics Jan 25 '16 at 19:51
  • I am afraid I don't know how to do that. – Sumedh Jan 25 '16 at 20:01
  • Can you share the SQL DDL statement that was used to insert this data into SQL Server? If you don't prepend the strings with N, like `N'中華民族'` instead of a bare string constant like `'中華民族'`, then you may not get the correct characters back out. – pneumatics Jan 25 '16 at 21:36
  • @pneumatics: I too am unaware of how the data was inserted in the SQL server. What I know for sure is when the data was retrieved in Excel, the values showed up just fine. – Sumedh Jan 25 '16 at 22:29
  • Ok, that it works in Excel is helpful information. – pneumatics Jan 25 '16 at 22:43
  • 1
    @Sumedh Regarding *Error in iconv(query, to = enc) : embedded nul in string: 'þÿ\0S\0E\0L\0E\0C\0T\0 *: **There is an open bug in R that prevents UTF-16/UCS2 conversions under certain conditions** (which may be the case here): https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16737 – R Yoda Feb 15 '17 at 07:58

3 Answers3

2

Check the database's character encoding:

select userenv('language') from dual;
SIMPLIFIED CHINESE_CHINA.AL32UTF8 

Change your Environment Variable NLS_LANG before connecting to the database:

Sys.setenv(NLS_LANG="SIMPLIFIED CHINESE_CHINA.AL32UTF8")
Connection <- odbcConnect("abc", uid = "123", pwd = "123", DBMSencoding = "UTF-8", readOnlyOptimize=T)
Nissa
  • 4,636
  • 8
  • 29
  • 37
xhj
  • 21
  • 2
1

R on Windows has a lot of problems displaying characters outside of ASCII, even though it is often faithfully representing them internally. There is a lot of information in this answer about why this is the case, and some simple diagnostics in this answer. First try plotting, like:

# first, make sure plotting Chinese works in general
# (i.e., you have an appropriate font)
hanzi <- "漢字"
plot(1, 1, type="n")
text(1, 1, hanzi)

If that works, replace the hanzi <- "漢字" line with your sql query line to get some Chinese text from your database into a string variable, and try plotting that. If it shows up on the plot, then the characters are being read fine and represented internally fine, and the problem is just displaying them in the console. If plotting worked for the "漢字" string variable but doesn't work for your SQL-extracted string, then at least you know that the problem is actually with the SQL part and not just with display in the console.

Community
  • 1
  • 1
drammock
  • 2,373
  • 29
  • 40
  • Thank you for sharing this approach. It seems like the problem is with the SQL part. – Sumedh Jan 26 '16 at 12:43
  • @Hack-R, any suggestions on how to improve? Though the data is retrieved fine in Excel when using SQL, it is retrieved as `????` in R. – Sumedh Jul 14 '16 at 01:22
  • @Sumedh I don't know. I just thought you were saying it's a SQL problem not an R problem. I use Chinese data but I don't have this problem if the encoding is correct. – Hack-R Jul 14 '16 at 01:36
  • Hey All, I am currently facing the same issue as reported by the OP , this thread seems to be the latest on this issue. Any solutions on this please ? – dagan Oct 24 '16 at 08:59
0

I got the same problem and successfully solved it. It was quite simple. Go to Control Panel --> Region and Language --> Administrative --> Change system locale --> Chinese.

pzhao
  • 335
  • 1
  • 10