1

I'm trying to prepare an SPSS .sav data file with survey data for performing analyses in R. Now I have an issue that some variables with binary values 0/1 (signifying no/yes) have been transformed unexpectedly.

I have used the memisc package to import the data as a data.set object.

Dset.core <- spss.system.file(file="C://..../data_coded.sav",
                            varlab.file=NULL,
                            codes.file=NULL,
                            missval.file=NULL,
                            count.cases=TRUE,
                            to.lower=FALSE      
)

This worked all fine, from what I saw from str() and codebook() outputs. One example of a 0/1 variable $AMEVYES (labels are 0=no, 1=yes) is shown here:

str(Dset.core)

Data set with 1999 obs. of 106 variables:

(...)
$ AMEVYES : Nmnl. item w/ 2 labels for 0,1 num 0 0 0 0 0 0 0 0 0 1 ...

I now want to convert the special data.set object created by memisc into a data frame with:

Dset2Df.core <- as.data.frame(Dset.core)

As intended, the nominal 0/1 variable was changed into a factor variable with corresponding levels. But for some strange reason, this procedure also changed the values of the variables, from 0/1 to 1/2, like in this example output:

str(Dset2Df.core) 

'data.frame': 1999 obs. of 106 variables:

(...) $ AMEVYES : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...

Why did this happen, and most importantly, how can I stop this from happening? Many thanks for a hint!

PS: I'm rather new to R and new to this forum, so please excuse if I missed any best practices when formulating my question.

Marco K
  • 11
  • 2
  • Stumbled upon this question when I was looking for something else. Regarding your question; we are missing a [practical reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Instead of all 1999 rows of 106 variables, why not show 5 rows? `head(Dset2Df.core$AMEVYES, 5)`. Also, the desired output is unclear, do you want keep `"Yes", "No"` AND `0,1`? – Shique Apr 04 '19 at 08:54

1 Answers1

1

As The Carpentries states:

Factors are stored as integers, and have labels associated with these unique integers. While factors look (and often behave) like character vectors, they are actually integers under the hood, and you need to be careful when treating them like strings.

Factors are internally stored as integers starting from 1. You cannot change these internally stored values. You can, however, change their labels ("Yes", "No") or (0, 1).

Shique
  • 724
  • 3
  • 18