10

As usual, I got some SPSS file that I've imported into R with spss.get function from Hmisc package. I'm bothered with labelled class that Hmisc::spss.get adds to all variables in data.frame, hence want to remove it.

labelled class gives me headaches when I try to run ggplot or even when I want to do some menial analysis! One solution would be to remove labelled class from each variable in data.frame. How can I do that? Is that possible at all? If not, what are my other options?

I really want to bypass reediting variables "from scratch" with as.data.frame(lapply(x, as.numeric)) and as.character where applicable... And I certainly don't want to run SPSS and remove labels manually (don't like SPSS, nor care to install it)!

Thanks!

aL3xa
  • 35,415
  • 18
  • 79
  • 112

5 Answers5

16

Here's how I get rid of the labels altogether. Similar to Jyotirmoy's solution but works for a vector as well as a data.frame. (Partial credits to Frank Harrell)

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') 
    for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL
  }
  else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}

Use as follows:

my.unlabelled.df <- clear.labels(my.labelled.df)

EDIT

Here's a bit of a cleaner version of the function, same results:

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in seq_along(x)) {
      class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') 
      attr(x[[i]],"label") <- NULL
    } 
  } else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}
Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
  • I've ran your code and it worked for me. But I also have a question. The function `is.list` shows an output of `TRUE` when used a `data.frame` whose class is `tbl_df`, `tbl` and `data.frame`. Do you understand why the output shows this? – John Doe Jan 27 '20 at 14:36
  • A `data.frame` is a list even though its class is not explicitly "list". Try `is(data.frame())`, and also `is(tibble::tibble())`; we could say that both df's and tibbles are "constrained" lists. They are generally rectangular (exception being that a df can contain a matrix, which is pretty odd), and if you look at the source code for `data.frame`, you'll see it's made from lists (e.g. `return(structure(list(), names = character(), row.names = row.names, class = "data.frame"))`) – Dominic Comtois Jan 27 '20 at 15:46
5

A belated note/warning regarding class membership in R objects. The correct method for identification of "labelled" is not to test for with an is function or equality {==) but rather with inherits. Methods that test for a specific location will not pick up cases where the order of existing classes are not the ones assumed.

You can avoid creating "labelled" variables in spss.get with the argument: , use.value.labels=FALSE.

w <- spss.get('/tmp/my.sav', use.value.labels=FALSE, datevars=c('birthdate','deathdate'))

The code from Bhattacharya could fail if the class of the labelled vector were simply "labelled" rather than c("labelled", "factor") in which case it should have been:

class(x[[i]]) <- NULL  # no error from assignment of empty vector

The error you report can be reproduced with this code:

> b <- 4:6
> label(b) <- 'B Label'
> str(b)
Class 'labelled'  atomic [1:3] 4 5 6
  ..- attr(*, "label")= chr "B Label"
> class(b) <- class(b)[-1]
Error in class(b) <- class(b)[-1] : 
  invalid replacement object to be a class string
IRTFM
  • 258,963
  • 21
  • 364
  • 487
2

You can try out the read.spss function from the foreign package.

A rough and ready way to get rid of the labelled class created by spss.get

for (i in 1:ncol(x)) {
    z<-class(x[[i]])
    if (z[[1]]=='labelled'){
       class(x[[i]])<-z[-1]
       attr(x[[i]],'label')<-NULL
    }
}

But can you please give an example where labelled causes problems?

If I have a variable MAED in a data frame x created by spss.get, I have:

> class(x$MAED)
[1] "labelled" "factor"  
> is.factor(x$MAED)
[1] TRUE

So well-written code that expects a factor (say) should not have any problems.

Jyotirmoy Bhattacharya
  • 9,317
  • 3
  • 29
  • 38
  • Actually, this approach doesn't remove the `labelled` class. Here's an error: `Error in class(x[[i]]) <- z[-1] : invalid replacement object to be a class string` – aL3xa Mar 10 '10 at 13:18
  • It worked with a SPSS file I tried. Can you please link to a sample file where this fails? Or give the output of for (i in 1:ncol(x)) print(class(x[[i]])) where x is the imported data frame. – Jyotirmoy Bhattacharya Mar 11 '10 at 04:18
  • Wouldn't you like to try `sapply(x, class)` instead of using loop? Oh, and, sadly, I can't recall which data file was I using... It was so long ago... – aL3xa Mar 10 '11 at 01:24
  • An example of this being important is a c("labelled", "factor") class object [breaking dplyr functions](https://github.com/hadley/dplyr/issues/658) – Roger Filmyer Nov 30 '14 at 20:53
  • The code should use `inherits` and `setdiff` rather tthan the more fragile methods proposed above. – IRTFM Aug 12 '22 at 19:59
1

Suppose:

library(Hmisc)
w <- spss.get('...')

You could remove the labels of a variable called "var1" by using:

attributes(w$var1)$label <- NULL

If you also want to remove the class "labbled", you could do:

class(w$var1) <- NULL 

or if the variable has more than one class:

class(w$var1) <- class(w$var1)[-which(class(w$var1)=="labelled")]

Hope this helps!

Emer
  • 3,734
  • 2
  • 33
  • 47
0

Well, I figured out that unclass function can be utilized to remove classes (who would tell, aye?!):

library(Hmisc)
# let's presuppose that variable x is gathered through spss.get() function
# and that x is factor
> class(x)
[1] "labelled" "factor"
> foo <- unclass(x)
> class(foo)
[1] "integer"

It's not the luckiest solution, just imagine back-converting bunch of vectors... If anyone tops this, I'll check it as an answer...

aL3xa
  • 35,415
  • 18
  • 79
  • 112