0

I have two columns Time and Event. There are two events A and B. Once an event A takes place, I want to find when the next event B occurs. Column Time_EventB is the desired output.

This is the data frame:

df <- data.frame(Event = sample(c("A", "B", ""), 20, replace = TRUE), Time = paste("t", seq(1,20)))

![enter image description here

  1. What is the code in R for finding the next instance of a value (B in this case)?
  2. What is the code for once the instance of B is found, return the value of the corresponding Time Column?

The code should be something like this:

data$Time_EventB <- ifelse(data$Event == "A", <Code for returning time of next instance of B>, "")

In Excel this can be done using VLOOKUP.

suku
  • 10,507
  • 16
  • 75
  • 120
  • If your R data frame is called `df` then you can remove the blanks in `df$Time_EventB` and write to a new data frame called `newdf` with this code: `newdf <- df[!is.na(df$Time_EventB), ]`. In the future, you can receive better assistance by providing your coding attempts! Good luck. – Jeff Jan 04 '17 at 00:53
  • 2
    Screenshots of data are not helpful. Put it in a delimited text format at a minimum or ideally use `dput` inside R. Also, is there ever a case where `B` doesn't always immediately follow `A`? – thelatemail Jan 04 '17 at 01:03
  • If your events always alternate A to B as are shown, then `your_data$Time_EventB[your_data$Event == "A"] = your_data$Time[your_data$Event == "B"]` should work. – Gregor Thomas Jan 04 '17 at 01:06
  • @Gregor I want the output in `your_data$Time_EventB`. So it should be in some form like this `your_data$Time_EventB <- ifelse(your_data$Event == "A", , "")` – suku Jan 04 '17 at 01:11
  • If you initialize the column to blanks first, `your_data$Time_EventB = ""`, my code should work fine. If you [share your data reproducibly with `dput()` (see this link for more info)](http://stackoverflow.com/q/5963269/903061) I'll be happy to demonstrate. Be open to answers not of the form you expect. – Gregor Thomas Jan 04 '17 at 01:15
  • @thelatemail if there is an A return the next instance of B. So they need not be alternate. But you ignore the 2nd B in an ABB sequence – suku Jan 04 '17 at 01:15
  • If your events don't strictly alternate, then when you create a reproducible example make sure to include that feature. – Gregor Thomas Jan 04 '17 at 01:16
  • @Gregor, Ill prepare the data frame and post here. Thanks for the reference – suku Jan 04 '17 at 01:19

2 Answers2

2

Here's a simple solution:

set.seed(1)
df <- data.frame(Event = sample(c("A", "B", ""),size=20, replace=T), time = 1:20)

as <- which(df$Event == "A")
bs <- which(df$Event == "B")
next_b <- sapply(as, function(a) {
    diff <- bs-a
    if(all(diff < 0)) return(NA)
    bs[min(diff[diff > 0]) == diff]
})
df$next_b <- NA
df$next_b[as] <- df$time[next_b]

> df
   Event time next_b
1      A    1      2
2      B    2     NA
3      B    3     NA
4           4     NA
5      A    5      8
6           6     NA
7           7     NA
8      B    8     NA
9      B    9     NA
10     A   10     14
11     A   11     14
12     A   12     14
13         13     NA
14     B   14     NA
15         15     NA
16     B   16     NA
17         17     NA
18         18     NA
19     B   19     NA
20         20     NA
thc
  • 9,527
  • 1
  • 24
  • 39
  • Thanks. It looks like a deconstructed VLOOKUP. You should make this into a VLOOKUP package for R so that people can use a single formula. – suku Jan 04 '17 at 01:51
0

Here's an attempt using a "rolling join" from the data.table package:

library(data.table)
setDT(df)

df[Event=="B", .(time, nextb=time)][df, on="time", roll=-Inf][Event != "A", nextb := NA][]

#    time nextb Event
# 1:    1     2     A
# 2:    2    NA     B
# 3:    3    NA     B
# 4:    4    NA      
# 5:    5     8     A
# 6:    6    NA      
# 7:    7    NA      
# 8:    8    NA     B
# 9:    9    NA     B
#10:   10    14     A
#11:   11    14     A
#12:   12    14     A
#13:   13    NA      
#14:   14    NA     B
#15:   15    NA      
#16:   16    NA     B
#17:   17    NA      
#18:   18    NA      
#19:   19    NA     B
#20:   20    NA   

Using data as borrowed from @thc

thelatemail
  • 91,185
  • 12
  • 128
  • 188