I know that there are more questions like this. However, I can not get it done. I have a very large data.table with about 800.000 rows. Every row contains some specific data, then some demand data for 2 years (from column 12 onward), and in the last column is a number. This number determines how many demand columns should be replaced.
Example with one row:
ITEM COUNTRY Q1 Q2 Q3 Q4 ---- Q24 NUMBER
1 1 0 0 0 0 ---- 2 2
If this is a vector called x
for example, I would do this:
x[2:(2+x$NUMBER-1)] <- NA
Now I want to do this for all rows. A for loop is way too slow. So I was thinking of apply.
fun_NA <- function(x){ #x is then a row of the data.table
#still use that with x[,] as x is still a data.table and not a vector
if(x[,60]>0){
x[,12:(12+as.numeric(x[,60])-1)] <- NA
}
}
dt = apply(dt, 1, fun_NA)
where dt
is my data.table, which has 60 columns. The first demand value starts in column 12. It does not work.. Now it gives the error "Error in x[,60] : incorrect number of dimensions", but I tried other ways too and received other errors.
Reason why I want to do this: The number represent the month (after first month of my data set) when the item was able to sell, so in the months before, the demand was not 0, but just did not exist. I need it to be NA, as 0 will cause wrong calculations later on.
EDIT: Removed the comma's so this is the new code,
fun_NA <- function(x){ #x is then a row of the data.table
if(x[60]>0){
x[12:(12+as.numeric(x[60])-1)] <- NA
}
}
dt = apply(dt, 1, fun_NA)
However, this returns a large list with all NULL and NA elements..
Edit: the head of the data-table is as follows: (dput)
structure(list(ITEM = c(1, 1, 2, 2, 2, 2), COUNTRY = c(1, 2,
3, 4, 5, 2), DATE = c("2015-02-02", "2015-02-02", "2014-09-27",
"2014-09-27", "2014-09-27", "2014-09-27"), q_1 = c(0, 0, 2, 0,
0, 133), q_2 = c(0, 0, 24, 0, 9, 119), q_3 = c(0, 0, 15, 0, 13,
121), q_4 = c(0, 0, 7, 0, 2, 51), q_5 = c(0, 0, 12, 0, 6, 59),
q_6 = c(0, 0, 3, 0, 0, 36), q_7 = c(0, 0, 6, 0, 6, 41), q_8 = c(0,
0, 19, 0, 4, 42), q_9 = c(0, 0, 3, 0, 5, 48), q_10 = c(0,
0, 5, 0, 11, 49), q_11 = c(0, 0, 6, 0, 1, 42), q_12 = c(0,
0, 0, 0, 8, 70), q_13 = c(0, 0, 1, 0, 19, 81), q_14 = c(0,
0, 5, 0, 98, 86), q_15 = c(0, 0, 12, 0, 10, 152), q_16 = c(0,
0, 7, 0, 8, 95), q_17 = c(0, 0, 30, 0, 5, 62), q_18 = c(0,
0, 6, 0, 10, 47), q_19 = c(0, 0, 7, 0, 1, 35), q_20 = c(2,
0, 7, 0, 0, 47), q_21 = c(0, 2, 16, 5, 4, 70), q_22 = c(0,
0, 7, 0, 7, 46), q_23 = c(0, 0, 8, 0, 79, 20), q_24 = c(0,
0, 5, 0, 26, 45), NUMBER = c(13, 13, 8, 8, 8, 8)), .Names = c("ITEM",
"COUNTRY", "DATE", "q_1", "q_2", "q_3", "q_4", "q_5", "q_6",
"q_7", "q_8", "q_9", "q_10", "q_11", "q_12", "q_13", "q_14",
"q_15", "q_16", "q_17", "q_18", "q_19", "q_20", "q_21", "q_22",
"q_23", "q_24", "NUMBER"), class = c("data.table", "data.frame"
), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000004490788>)