0

I have a .sav file from SPSS which contains data from a survey conducted by means of a questionnaire. I tried to open this .sav file in R, but I have a problem with replicating the structure of the original data file. It means that in the original data file there are variables which have values and labels, for ex. variable "satisfaction with XY" has values "1" "2" and "3" with corresponding labels 1 = "satisfied", 2 = "both satisfied and unsatisfied" and 3 = "unsatisfied".

I found out that through the package "memisc" I can replicate this structure by creating "item" variables. The code looks like this:

labels(data$XY) <- c("satisfied"       =  1,     
                         "both satisfied and unsatisfied"             =  2,
                         "unsatisfied" =  3)

What i don't know is how to apply this to all of the variables i select (not just to one and not to all of the variables).

Jignesh Sutar
  • 2,909
  • 10
  • 13
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • how did you read in the `.sav` file ? – mtoto Feb 22 '16 at 19:44
  • @mtoto this way: `data <- as.data.set(spss.system.file("path"))` – tmfmnk Feb 22 '16 at 19:58
  • It's not clear what you mean. If you have a list of column names, then you can loop like `for (n in mycolnames) levels(data[[n]]) <- c(chuffed = 1, "pissed right off" = 2)` – Frank Feb 22 '16 at 20:29
  • @Frank I just need to create a function which will do the same as this `labels(data$XY) <- c("satisfied" = 1, "both satisfied and unsatisfied" = 2, "unsatisfied" = 3)` but for all of the variables i select. – tmfmnk Feb 22 '16 at 20:40
  • Yes, and why doesn't a loop work? – Frank Feb 22 '16 at 20:42

1 Answers1

0

The haven package works well with SPSS survey data. Here's a workflow:

library(haven)
# Load data
dat <- read_spss("your_SPSS_file.sav")

# Identify the "labelled"-class variables
labeled_vars <- sapply(dat, function(x) class(x) == "labelled")

# Convert only the labeled vars to factors
dat[, labeled_vars] <- lapply(dat[, labeled_vars], as_factor)

This will create factors that retain the original order (due to underlying numeric values) and labels as the SPSS variables.

You can also handle SPSS variable labels (vs. variable names). After reading the SPSS data, grab the variable labels:

var_names <- sapply(raw, function(x) attr(x, "label"))

And then assign them to the data.frame resulting from the above code:

names(dat) <- var_names
Sam Firke
  • 21,571
  • 9
  • 87
  • 105
  • I tried it out but it doesn't work. After reading in the data i get this error `Unrecognized record type 7, subtype 18 encountered in system file` and after running this `dat[, labeled_vars] <- lapply(dat[, labeled_vars], as_factor)` i get `incorrect number of dimensions`. – tmfmnk Feb 22 '16 at 21:07
  • Hrm, looks like a [common error](http://stackoverflow.com/questions/7691739/warning-error-when-importing-a-sav) and unrelated to the haven package in particular. If it's just a warning, you could check and see if the data has in fact imported with errors or not... – Sam Firke Feb 22 '16 at 21:14
  • I'm not sure what's causing that latter problem with your particular data set, sorry. – Sam Firke Feb 22 '16 at 21:19
  • I checked it out and they are imported. I can't say if there are errors. – tmfmnk Feb 22 '16 at 21:24