0

I'm using this code:

library(RODBC)
sql3 <- "SELECT TOP 10 Address AS Addr 
  FROM dbo.Address 
  Where CountryCode = 'RU'"
con <- odbcDriverConnect('driver={SQL SERVER};server=servername;database=databasename;trusted_connection=true')
df <- as.data.frame(sqlQuery(con,sql3),stringsASFactors=FALSE)
print(df)

Which produces the following results:

> print(df)
                                                                 Addr
    1                                 115573, ??????, ???????? ?-?, ?.22?
    2                            107113 ??????, ????? ???????????, ??? 26
    3                         142200 ??, ?.????????, ??????????? ?., ?. 1
    4                   614022 ?????, ?????????????? ?????, ??. ????, 37?
    5              109453 ?. ?????? ????????????? ????????, ?. 19, ???. 2
    6                                  129282 ??????, ??. ???????, ?.13-?
    7                         603000 ?????? ????????, ??????? ????????, 2
    8                    103164 ??????, ????? ??????? ???????????, ??? 26
    9             197341, ?????-?????????, ??-? ???????????, ?.19, ????.2
    10 429950, ?????????? ???????, ?. ??????????????, ??. ??????????, 42?

The results should be a list of Russian addresses.

As you can maybe see, all 'regular' characters are getting imported fine (e.g. numbers), but the Russian characters aren't making it. I'm guessing I somehow need to set the character encoding before it reaches the dataframe, but I'm not sure how to do that. Also to clarify, the correct address characters appear when the data is queried from SSMS.

Any pointers would be appreciated, thanks.

L Robin
  • 55
  • 7
  • Perhaps this might help [Fetching UTF-8 text from MySQL in R returns “????”](http://stackoverflow.com/questions/12869778/fetching-utf-8-text-from-mysql-in-r-returns) – Miha Aug 02 '16 at 19:11
  • Searching `?sqlQuery` for "encoding" points to `?odbcConnect`, which mentions the `DMSencoding` parameter. Try setting that to `"UTF-8"` when you connect. – Gregor Thomas Aug 02 '16 at 19:13
  • It also has the helpful note *"If it is possible to set the DBMS or ODBC driver to communicate in the character set of the R session then this should be done. For example, MySQL can set the communication character set via SQL, e.g. SET NAMES 'utf8'."*, which may or may not apply for SQL Server. – Gregor Thomas Aug 02 '16 at 19:14
  • I've tried utilizing both of your suggestions, charset=utf8 and DBMSencoding, but no luck. After further digging I'm wondering if the issue lies in `?sqlQuery` in the `as.is` argument @Miha @Gregor – L Robin Aug 02 '16 at 19:48

1 Answers1

0

The ? will be returned for non-english characters unless you specify N forcing unicode.

SELECT 'ук ферт хер'     --returns ?? ???? ???
SELECT N'ук ферт хер'    --returns ук ферт хер

Most importantly, this has to be done on the insert...

drop table #country
create table #country (names nvarchar(50))
insert into #country(names) values (N'Россия'),('Россия')

SELECT names FROM #country

--Results
--------------------------
names
--------------------------
Россия
??????
-------------------------
S3S
  • 24,809
  • 5
  • 26
  • 45
  • I've used that before for some dynamic sql, but I don't think it applies in this case. In my code Select TOP 10 N'Address1'... would just return 10 rows of Address1. As far as inserting the data, it's already encoded correctly in sql server, something is just going amok when it goes into the R environment. – L Robin Aug 02 '16 at 19:28
  • So if you select the data in SSMS it returns correctly @LRobin – S3S Aug 02 '16 at 19:32