22

I am trying to use data.table to recode a variable based on certain conditions. My original dataset has around 30M records and after all variable creation around 130 variables. I used the methods suggested here: conditional statements in data.table (M1) and also here data.table: Proper way to do create a conditional variable when column names are not known? (M2)

My goal is get the equivalent of the below code but something that is applicable using data.table

samp$lf5 <- samp$loadfactor5

samp$lf5 <- with(samp, ifelse(loadfactor5 < 0, 0, lf5))

I will admit that I don't understand .SD and .SDCols very well, so I might be using it wrong. The code and errors from (M1) and (M2) are given below and the sample dataset is here: http://goo.gl/Jp97Wn

(M1)

samp[,lf5 = if(loadfactor5 <0) 0 else loadfactor5]

Error Message

Error in `[.data.table`(samp, , lf5 = if (loadfactor5 < 0) 0 else loadfactor5) : 
unused argument (lf5 = if (loadfactor5 < 0) 0 else loadfactor5)

When I do this:

samp[,list(lf5 = if(loadfactor5 <0) 0 else loadfactor5)]

it gives lf5 as a list but not as part of the samp data.table and does not really apply the condition as lf5 still has values less than 0.

(M2)

Col1 <- "loadfactor5"
Col2 <- "lf5"

setkeyv(samp,Col1)
samp[,(Col2) :=.SD,.SDCols = Col1][Col1<0,(Col2) := .SD, .SDcols = 0]

I get the following error

Error in `[.data.table`(samp, , `:=`((Col2), .SD), .SDCols = Col1) : 
unused argument (.SDCols = Col1)

Any insights on how to finish this appreciated. My dataset has 30M records so I am hoping to use data.table to really cut the run time down.

Thanks,

Krishnan

Community
  • 1
  • 1
Krishnan
  • 1,265
  • 2
  • 13
  • 24
  • 3
    `samp[, lf5 := ifelse(loadfactor5 < 0, 0, loadfactor5)]`; search for `.SD` on SO to find out what it is/does – eddi Aug 29 '14 at 16:17
  • I was able to use this command to get the desired result. – Krishnan Aug 29 '14 at 18:25
  • @Krishnan, could you please answer it yourself and accept it so that the Q remains answered? Thanks. – Arun Aug 30 '14 at 11:58

2 Answers2

40

Answer provided by eddi and included here for the sake of completeness.

samp[, lf5 := ifelse(loadfactor5 < 0, 0, loadfactor5)]

Community
  • 1
  • 1
Krishnan
  • 1,265
  • 2
  • 13
  • 24
  • 2
    I guess the quid of the question is to use ":=" instead of "=". And you could also write samp[, lf5 := loadfactor5*(loadfactor>=0)] – skan Apr 07 '15 at 11:32
  • 4
    you should use `fifelse` which is faster and implemented in data.table >= 1.12.4 – fc9.30 Sep 01 '21 at 08:56
9

Another way (which I prefer because it's, in my opinion, cleaner):

samp[, lf5 := 0]; samp[loadfactor5 > 0, lf5 := loadfactor5];

I use data.table with a dataset with 90M rows; I am continually amazed at how fast data.table is for operations like the above.

BCC
  • 180
  • 1
  • 7