0
testd <- data.frame(A= c(1,2,3,4), B = c(5,6,7,8), C= c(3,4,5,9) )

I would like to remove rows where data$A * data$B * data$C > 105.

I could solve this for example using ifelse, create a fourth column and delete afterwards. The problem is that my file is almost taking up all memory and I can't proceed. Is it possible to row by row using apply function?

rowf <- function(x){
  x <- as.data.frame(x)
 ans1 <- x$A * x$B * x$C

 if(ans1 > 105){

   return(NULL)
 }

else {

  return(x)
}

}

apply(testd,1, rowf)

The above is my try on this, but I cant succeed.

Cyrus Mohammadian
  • 4,982
  • 6
  • 33
  • 62
MLEN
  • 2,162
  • 2
  • 20
  • 36
  • I'm guessing that `apply` will save result in a new object – HubertL Nov 03 '16 at 19:15
  • Have you tried something like `dplyr::filter(testd, A * B * C <= 105)`? – ulfelder Nov 03 '16 at 19:15
  • 2
    I would suggest working on half the dataset until it fits in memory with comfortable margins – HubertL Nov 03 '16 at 19:22
  • `apply` will coerce to a matrix for the calculation and use *more* memory - as well as creating a copy. Use `data.table`. – Gregor Thomas Nov 03 '16 at 19:28
  • ``dplyr``, specially its ``filter`` command tends to be slower and use up more memory than the equivalent ``data.table`` command (this may have changed with the latest ``dplyr`` release). Try ``DT[DT $A * DT $B * DT $C <= 105]``. [See this discussion](http://stackoverflow.com/a/27520688/5278205) for more on the topic. – Cyrus Mohammadian Nov 03 '16 at 19:52
  • @CyrusMohammadian each of your Boolean expressions will create a temporary Boolean vector, JFI – David Arenburg Nov 03 '16 at 20:37
  • @David Arenburg ahh I see, that would add to memory problems. – Cyrus Mohammadian Nov 03 '16 at 20:39

1 Answers1

0

Using subsetting:

testd <- testd[(testd$A*testd$B*testd$C)<105,]
Marcelo
  • 4,234
  • 1
  • 18
  • 18