I have a big data.frame; 100,000 observations of 700 variables.
Most of the variables have actually value 0 in all the observations, and I would like to remove that variables/columns.
I tried the following,
data <- data[apply(data, 2, function(x){all(x == 0)})]
But the apply took a lot of time to resolve.
I tried a while
, in case the problem was working with all data
at once.
i <- 1
while (i <= ncol(data)) {
if (all(data[i] == 0)) {
data[i] <- NULL
} else {
i <- i+1
}
}
But I kept having the same problem, it took a lot.
So,
- Why does that operation take THAT long? Even though the data.frame is big, the operation is pretty simple.
and, above all
- Is there any way to do this faster?