Read spss file with Cyrillic into R

Question

I am trying to read several SPSS files into R that include Cyrillic text. All of the files are in Cyrillic text. When I read most of them into R, the console says "re-encoding from CP1251". However, when I read some of the files, also in Cyrillic text, it says "re-encoding from CP1252" which I think is a Latin script. The CP1251 files read into R with no problem. However, the CP1252 files become gibberish in R. I’ve tried the foreign, haven and hmisc packages for reading in the SPSS files and none have worked. I've also tried including reencode='utf-8'. When I do this, the Cyrillic text all becomes NA. The problem occurs whether I'm working in R or RStudio.

x1<- read.spss("cp1251_file.sav", to.data.frame = T) #1251 file reads in fine

x2<- read.spss("cp1252_file.sav", to.data.frame = T) #1252 file becomes gibberish

x2<- read.spss("cp1252_file.sav", to.data.frame = T, reencode='utf-8') #Cyrillic text in CP1252 file becomes NA

Thanks for your help.

for me it works for German umlaute (üäö) with a combination of the following: `options(encoding = "UTF-8"); spssfile <- as.data.set(spss.system.file('yourfiles.sav')); spssfile <- Iconv(spssfile,from="UTF-8",to="UTF-8")`can you check those? — Jan, Jul 06 '17 at 03:52
this question/answers may also be helpful: https://stackoverflow.com/questions/3136293/read-spss-file-into-r?rq=1 — Jan, Jul 06 '17 at 04:00
Thank you. I've tried this and now I get an error when I try to convert to a dataframe. spssfile <- as.data.set(spss.system.file('file.sav', use.value.labels = FALSE)); spssfile <- Iconv(spssfile,from="UTF-8",to="UTF-8"); df<- as.data.frame(spssfile, stringsAsFactors=F); error: Error in as.factor(x) : Duplicate labels — ab27, Jul 06 '17 at 04:47
Looks like it works if I tell R that the file is CP1251 even though it thinks it is CP1252. Thanks!: 'df <- spss.system.file("file.sav") df <- Iconv(df,from="CP1251",to="UTF-8") df1<-as.data.frame(as.data.set(df))' — ab27, Jul 06 '17 at 18:30

score 0 · Answer 1 · answered Jul 06 '17 at 18:35

Looks like it works if I use the memisc package and I tell R that the file is CP1251 even though it thinks it is CP1252 when using read.spss. Thanks!:

df <- spss.system.file("file.sav") df <- Iconv(df,from="CP1251",to="UTF-8") df1<-as.data.frame(as.data.set(df))

Read spss file with Cyrillic into R

1 Answers1