1

I'm having trouble changing the "locale" when using the DBI package in R. More specifically, my database contains characters such as "é, è, ê, ..." and DBI does not read them correctly.

When I perform the same query using the RODBC package, it does get me the correct result. My question: how can I get the correct result (i.e. reading "é, è, ê, ..." correctly) with the DBI package?

This should be a reproducible example:

sql <- "select * from myDatabase.dbo.myTable"
# Where myTable contains any of the difficult characters

# Try with DBI
library(odbc)
library(DBI)

conDBI <- dbConnect(
  odbc::odbc(),
  dsn   = "myDsn",
  UID   = myLogin,
  PWD   = myPassword,
  Port  = 1433
)

table_DBI <- dbGetQuery(conDBI, sql)

# Try with RODBC
library(RODBC)

conRODBC <- odbcConnect(
  "myDsn",
  uid = myLogin,
  pwd = myPassword
)

table_RODBC <- sqlQuery(conRODBC, sql)

In case this is relevant, here's my session info:

> sessionInfo()
R version 3.4.0 Patched (2017-06-02 r72765)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Belgium.1252  LC_CTYPE=Dutch_Belgium.1252    LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C                  
[5] LC_TIME=Dutch_Belgium.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  tools     methods   base     

other attached packages:
[1] RODBC_1.3-15 DBI_0.6-1    odbc_1.1.1  

loaded via a namespace (and not attached):
[1] bit_1.1-12     compiler_3.4.0 hms_0.3        tibble_1.3.3   Rcpp_0.12.11   bit64_0.9-7    blob_1.1.0     rlang_0.1.1   
Scarabee
  • 5,437
  • 5
  • 29
  • 55
Willem
  • 976
  • 9
  • 24
  • Have you looked at this: https://stackoverflow.com/questions/38363566/trouble-with-utf-8-characters-what-i-see-is-not-what-i-stored ? – James Thomas Durant Oct 04 '17 at 13:00
  • 1
    This seems to indicate I need to change the way the data is stored in the database. But I'm using a company-wide database that is not under my control. So I need a solution that just fixes it (using the DBI package) on my side. This should be possible since the RODBC package does exactly that. – Willem Oct 04 '17 at 13:45
  • Take a look at http://mysql.rjweb.org/doc.php/charcoll#diagnosing_charset_issues for how to pass a hex characterset. Maybe that will work? – James Thomas Durant Oct 04 '17 at 14:09
  • Is it possible this explanation is only valid for MySQL? Does it work for an SQL Server database? In any case it looks extremely elaborate... I'm not a whiz kid, so was hoping for a simpler solution ;-). If one exists of course. – Willem Oct 05 '17 at 12:45
  • Have you tried using the encoding argument in the dbConnect function? It should be the text encoding used on the Database. If the database is the same as your local encoding set to "". See iconvlist() for a complete list of available encodings on your system. Note strings are always returned UTF-8 encoded. You might try: conDBI <- dbConnect( odbc::odbc(), dsn = "myDsn", UID = myLogin, PWD = myPassword, Port = 1433, encoding = "latin1" ) – James Thomas Durant Oct 05 '17 at 19:26
  • That fixed it! Thanks very much :-). Can I upvote your comment, or do you need to make a separate reply? (As you can see this is my first ever question.) – Willem Oct 06 '17 at 08:31
  • Hi Willem - added the answer if it is working upvote the answer. – James Thomas Durant Oct 09 '17 at 19:50
  • Apparently I can't upvote because I have less than 15 reputation... But I marked it as the accepted answer. Thanks again! – Willem Oct 11 '17 at 10:04

1 Answers1

0

Have you tried using the encoding argument in the dbConnect function? It should be the text encoding used on the Database. "If the database is the same as your local encoding set to "". See iconvlist() for a complete list of available encodings on your system. Note strings are always returned UTF-8 encoded."

You might try:

conDBI <- dbConnect( 
odbc::odbc(), 
dsn = "myDsn", 
UID = myLogin, 
PWD = myPassword, 
Port = 1433, 
encoding = "latin1" )

HTH James