0

I am trying to convert a loop to vectorization. In my application, I have over 2 million rows and looping is too slow. I have read this post which is very helpful: Speed up the loop operation in R

Here is an example of what my data looks like:

m <- data.frame(time = 1:10, level = c(0,0,60,100,0,0,100,100,0,0))

>m
    time level
1     1     0
2     2     0
3     3    60
4     4   100
5     5     0
6     6     0
7     7   100
8     8   100
9     9     0
10   10     0

What I want is a column machine that is either "on" or "off" depending on level.

if level != 0 then machine = "on"

if level goes to 0, then machine will turn off, after an arbitrary period lag. In this example, say lag = 2 then the result would be:

    time level machine
1     1     0     off
2     2     0     off
3     3    60      on
4     4   100      on
5     5     0      on
6     6     0     off
7     7   100      on
8     8   100      on
9     9     0      on
10    10    0     off

Any suggestions how how to vectorize this operation? I've looked into using lag from dplyr but haven't found a way to make it work.

I've written a loop that works for this example, as an illustration.

m$machine <- ifelse(m$level!=0, "on", 0)

tlag <- 2
# check to see if timeout period has elapsed
for (i in seq_along(m$machine)){
    if(m$machine[i]!="on") {
        nback <- i - tout
        if (nback <=0 ) nback <- 1
        if (sum(m$level[nback:i]) == 0){ #light should be off
            m$machine[i] <- "off"
        }
    }
}

for (i in seq_along(m$machine)){
    if(m$machine[i]==0) m$machine[i] <- "on"   
}
Community
  • 1
  • 1
Lloyd Christmas
  • 1,016
  • 6
  • 15

2 Answers2

2

Here is one solution using dplyr package's lag operator:

library(dplyr)
m %>% mutate(machine = ifelse((level != 0 |
                               (level == 0 &
                                lag(level, 1, default = 0) != 0)),
                              'on', 'off'))

Output is as follows:

   time level machine
1     1     0     off
2     2     0     off
3     3    60      on
4     4   100      on
5     5     0      on
6     6     0     off
7     7   100      on
8     8   100      on
9     9     0      on
10   10     0     off
Gopala
  • 10,363
  • 7
  • 45
  • 77
2

You can do this with data.table:

library(data.table)
m <- data.table(time = 1:10, level = c(0,0,60,100,0,0,100,100,0,0))
m[, machine := {lag.level = shift(level, 1, fill = 0); 
                ifelse(level != 0 | lag.level != 0, "on", "off") },]
Bulat
  • 6,869
  • 1
  • 29
  • 52