13

I have an unexpected [for me at least] error in calculating a standard deviation. The idea [*] is to convert all missing values to 1 and 0 otherwise. Then extract variables that have some [but not all] missing values, before a correlation is done. That extraction step is attempted with a sd function, but it fails [why?].

library(VIM)
data(sleep) # dataset with missing values

x = as.data.frame(abs(is.na(sleep))) # converts all NA to 1, otherwise 0
y = x[which(sd(x) > 0)] # attempt to extract variables with missing values

Error in is.data.frame(x) : 
(list) object cannot be coerced to type 'double'

# convert to double    
z = as.data.frame(apply(x, 2, as.numeric))
y = z[which(sd(z) > 0)]

Error in is.data.frame(x) : 
(list) object cannot be coerced to type 'double'

[*] R in Action, Robert Kabacoff

Henk
  • 3,634
  • 5
  • 28
  • 54

1 Answers1

19

sd on data.frames has been defunct since R-3.0.0:

> ## Build a db of all R news entries.
> db <- news()
> ## sd
> news(grepl("sd", Text), db=db)
Changes in version 3.0.3:

PACKAGE INSTALLATION

    o   The new field SysDataCompression in the DESCRIPTION file allows
        user control over the compression used for sysdata.rda objects in
        the lazy-load database.

Changes in version 3.0.0:

DEPRECATED AND DEFUNCT

    o   mean() for data frames and sd() for data frames and matrices are
        defunct.

Use sapply(x, sd) instead.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • 1
    Thanks Joshua. These are pretty important functions and it breaks some of the code that I have. :-(. – Henk Jun 05 '14 at 10:49
  • @Henk: yeah, it caused problems for quite a few CRAN packages at the time. – Joshua Ulrich Jun 05 '14 at 10:50
  • 5
    @Henk You can define your own `mean.data.frame` and `sd.data.frame` functions easily if you don't want to go through your legacy code and change it. – Roland Jun 05 '14 at 10:51
  • Does anyone else notice that using `sapply(x, sd)` makes the code go much much slower? Is there any faster alternative to this method? – Reilstein Oct 04 '16 at 00:35
  • @Reilstein: much slower compared to what? Your comment really should be a new question, but make sure you create a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and include some benchmarks to show that it's slower compared to some other method. – Joshua Ulrich Oct 04 '16 at 03:22