0

Quite new to R, I am trying to subselect certain columns in order to set their NA's to 0.

so far I have:

col_names1 <- c('a','b','c')
col_names2 <- c('e','f','g')
col_names <- c(col_names1, col_names2)
data = fread('data.tsv', sep="\t", header= FALSE,na.strings="NA", 
         stringsAsFactors=TRUE,
         colClasses=my_col_Classes
        )  
setnames(data, col_names)
data[col_names2][is.na(data[col_names2])] <- 0

But I keep getting the error

Error in `[.data.table`(`*tmp*`, column_names2): When i is a data.table (or character vector), x must be keyed (i.e. sorted, and, marked as sorted) so data.table knows which columns to join to and take advantage of x being sorted. Call setkey(x,...) first, see ?setkey.

I believer this error is saying I have the wrong order but I am not sure how I do?

agenis
  • 8,069
  • 5
  • 53
  • 102
tosh1611
  • 23
  • 1
  • 5
  • 1
    Just a quick note, but `fread` returns a data table. Since you are new to R, I can imagine that you actually wanted a data frame. If so, within `fread` specify `data.table = FALSE`. Also, keep in mind that missing values are not the same as 0... And depending on what you are doing, that could lead to biases. – slamballais Sep 07 '16 at 10:50

1 Answers1

3

You can do it with data.table assign :=

data <- data.table(a = c(2, NA, 3, 5), b = c(NA,2,3,4), c = c(2,5,NA, 6))
fix_columns <- c('a','b')    
fix_fun <- function(x) ifelse(is.na(x), 0 , x)

data[,(fix_columns):=lapply(.SD, fix_fun), .SDcols=fix_columns]

P.S. You cant select columns from data.table like data[col_names2]. If you want select them by character vector, one approach is : data[, col_names2, with = F]

Vadym B.
  • 681
  • 7
  • 21
  • [Avoid `ifelse`](http://stackoverflow.com/questions/16275149/does-ifelse-really-calculate-both-of-its-vectors-every-time-is-it-slow). See the question linked in the comments above for a better approach. – MichaelChirico Sep 07 '16 at 19:35