1

I have a data.table called td.br.2, in which some columns are completely NAs. These columns are of type numeric. What I would like to do, is only for these columns to transform them to factors.

I have tried the following, but it does not work ( I do not get an error but it does not do the job either)

td.br.2[] <- td.br.2[,lapply(.SD, function(x) {ifelse(sum(is.na(x)==nrow(td.br.2)),as.factor(x),x)})]
quant
  • 4,062
  • 5
  • 29
  • 70
  • 1
    `ifelse` applies the check to each element of its first argument. It is designed to be used for comparing vectorized expressions. `if..else` performs a check on a single expression. Your application applies to the second case. – lmo May 18 '17 at 14:06
  • @lmo Didn't know what (obviously). Very useful information. Thanks ! – quant May 18 '17 at 14:12

2 Answers2

1

I am not sure why you would want to do that, but here you are:

naColumns <- sapply(td.br.2, function(x) { all(is.na(x)) })
for (col in which(naColumns)) 
    set(td.br.2, j=col, value=as.factor(x[[col]]))

The factors will have no levels, but you can deal with that as necessary.

(The for loop is partly based on this.)

Community
  • 1
  • 1
user1310503
  • 557
  • 5
  • 11
  • i get the following error `Error in `[<-.data.table`(`*tmp*`, , naColumns, value = NA_integer_) : j must be vector of column name or positions` – quant May 18 '17 at 13:30
  • Sorry. What I wrote was for a `data.frame`. I have fixed it to work with a `data.table`. – user1310503 May 18 '17 at 13:53
1
n=10#nr of rows
m=10#nr of cols
N<-n*m
m1<-matrix(runif(N),nrow=n,ncol = m)
dt<-data.table(m1)
names(dt)<-letters[1:m]
dt<-cbind(dt,xxx=rep(NA,nrow(dt)))#adding NA column

At this point

str(dt)
Classes ‘data.table’ and 'data.frame':  10 obs. of  11 variables:
 $ a  : num  0.661 0.864 0.152 0.342 0.989 ...
 $ b  : num  0.06036 0.67587 0.00847 0.37674 0.30417 ...
 $ c  : num  0.3938 0.6274 0.0514 0.882 0.1568 ...
 $ d  : num  0.777 0.233 0.619 0.117 0.132 ...
 $ e  : num  0.655 0.926 0.277 0.598 0.237 ...
 $ f  : num  0.649 0.197 0.547 0.585 0.685 ...
 $ g  : num  0.6877 0.3676 0.009 0.6975 0.0327 ...
 $ h  : num  0.519 0.705 0.457 0.465 0.966 ...
 $ i  : num  0.43777 0.00961 0.30224 0.58172 0.37621 ...
 $ j  : num  0.44 0.481 0.485 0.125 0.263 ...
 $ xxx: logi  NA NA NA NA NA NA ...

So by executing:

dt<-dt[, lapply(.SD, function(x){ if(all(is.na(x)))as.factor(as.character(x)) else x}),]

yields:

str(dt)
Classes ‘data.table’ and 'data.frame':  10 obs. of  11 variables:
 $ a  : num  0.0903 0.0448 0.5956 0.418 0.1316 ...
 $ b  : num  0.672 0.582 0.687 0.113 0.371 ...
 $ c  : num  0.404 0.16 0.848 0.863 0.737 ...
 $ d  : num  0.073 0.129 0.243 0.334 0.285 ...
 $ e  : num  0.485 0.186 0.539 0.486 0.784 ...
 $ f  : num  0.4685 0.4815 0.585 0.3596 0.0764 ...
 $ g  : num  0.958 0.194 0.549 0.71 0.737 ...
 $ h  : num  0.168 0.355 0.552 0.765 0.605 ...
 $ i  : num  0.665 0.88 0.23 0.575 0.413 ...
 $ j  : num  0.1113 0.8797 0.1244 0.0741 0.8724 ...
 $ xxx: Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA
amonk
  • 1,769
  • 2
  • 18
  • 27
  • Thank you. But I do not understand why with the `ifelse` it does not work, but with the `if...else` works. Or maybe it was the condition inside the `if` – quant May 18 '17 at 13:58