My goal is to do the following:
I have a data.table with > 1 million rows and 4 columns. I want to add a fifth column carrying either a '1' or a '0', depending on the following condition: If there is zero in column 3 followed by any non-zero value in the next row of column 3 (like a sliding window of length 2), a '1' should be added to column 5, otherwise a '0'. The row position of this binary in column 5 should be the same as the non-zero value. Moreover, I want to do this based on a key (i.e. subsets of a column), as the code below shows.
I was unable to implement this in data.table (I have just started using it). I wrote the following code, which does what I want, but it's incredibly slow (>5 s/iteration):
# all_movement_fit_tidy is a data.table with dim 1.2 mio x 4
# set key "fly" in all_movement_fit_tidy
setkey(all_movement_fit_tidy, fly)
genotypes <- unique(all_movement_fit_tidy$fly)
DT_list <- list()
i <- 1
for(genotype in genotypes){
# Get subset of data.table by genotype
genotype_movement <- all_movement_fit_tidy[genotype]
# Calculate initiation list - figure out how to do this in data.table
initiation <- list()
for(j in 1:(nrow(genotype_movement) - 1)){
initiation[j] <- ifelse(genotype_movement[j, "mm"] == 0 &
genotype_movement[j + 1, "mm"] != 0, 1, 0)
}
# Add a 0 to the end of the list to equal number of rows,
# then bind it to genotype_movement
initiation[length(initiation) + 1] <- 0
genotype_movement <- cbind(genotype_movement, as.numeric(initiation))
colnames(genotype_movement)[5] <- "initiation"
# Save this data table in a list of data.tables (DT_list)
DT_list[[i]] <- genotype_movement
i <- i + 1
}
After this, I would bind all entries of the DT_list, which carries all data.tables, into one data.table. The reason why this code is so slow is the following section:
initiation <- list()
for(j in 1:(nrow(genotype_movement) - 1)){
initiation[j] <- ifelse(genotype_movement[j, "mm"] == 0 &
genotype_movement[j + 1, "mm"] != 0, 1, 0)
}
Here, I am looping through each row of the subset of the data.table, and assign the result of the ifelse() function to a list. How can I do this inside the j argument of data.table? I have tried something like:
genotype_movement[, initiation := function(x) for(i in
1:nrow(genotype_movement) ifelse(.SD[i, "mm"] == 0 & .SD[i + 1, "mm" !=
0, 1, 0)]
But this did not work because the ifelse() function returns single values which cannot be assigned to the initiation column.