Conditionally calculate time differences between rows in R

Question

I'm trying to calculate the time difference between a row and a row that has a column that meets some criteria.

Reading in some data:

my_data <- data.frame(criteria = c("some text", "some more text", " ", " ", "more text", " "),
                  timestamp = as.POSIXct(c("2015-07-30 15:53:15", "2015-07-30 15:53:47", "2015-07-30 15:54:48", "2015-07-30 15:55:48", "2015-07-30 15:56:48", "2015-07-30 15:57:49")))

        criteria           timestamp
1      some text 2015-07-30 15:53:15
2 some more text 2015-07-30 15:53:47
3                2015-07-30 15:54:48
4                2015-07-30 15:55:48
5      more text 2015-07-30 15:56:48
6                2015-07-30 15:57:49

I want to get the time difference (in minutes) between every row and the last row that wasn't blank in the criteria column. Therefore, I want:

        criteria           timestamp time_diff
1      some text 2015-07-30 15:53:15         0
2 some more text 2015-07-30 15:53:47         0
3                2015-07-30 15:54:48         1
4                2015-07-30 15:55:48         2
5      more text 2015-07-30 15:56:48         0
6                2015-07-30 15:57:49         1

So far, I've built the code to recognize where the "0's" should be - I just need the code to fill in the time differences. Here's my code:

my_data$time_diff <- ifelse (my_data$criteria != "", # Here's our statement
  my_data$time_diff <- "0", # Here's what happens if statement is TRUE
  my_data$time_diff <- NEED CODE HERE # if statement FALSE
  )

I have a feeling that this job may be better performed by something that isn't an ifelse statement, but i'm relatively new to R.

I've found q's on here where individuals tried to get time differences between neighboring rows (e.g. here and here), but have yet to find someone trying to deal with this kind of situation.

The closest question I've found to mine is this one, but that data are different from mine in how the individual wants to process them (at least from my vantage point).

edit: capitalized title.

It seems that for each "timestamp" you need the time difference from the "timestamp"s at `cummax((my_data$criteria != " ") * seq_len(nrow(my_data)))` repsectively? — alexis_laz, May 02 '16 at 20:47
@alexis_laz, I think so. To clarify what you mean, i'm comparing each timestamp (e.g. "timestamp3") against the timestamp of the largest row number _above "timestamp3"_ where `my_data$criteria != " " ` . Is that reading correct? If so, then yes. — Christopher Friedman, May 02 '16 at 21:24

score 2 · Accepted Answer · answered May 03 '16 at 02:43

Completing the answer with alexis_laz's masterful expression:

my_data <- data.frame(criteria = c("some text", "some more text", " ", " ", "more text", " "),
                      timestamp = as.POSIXct(c("2015-07-30 15:53:15", "2015-07-30 15:53:47", "2015-07-30 15:54:48", "2015-07-30 15:55:48", "2015-07-30 15:56:48", "2015-07-30 15:57:49")))

my_data$time_diff <- 
  my_data$timestamp - 
  my_data[cummax((my_data$criteria != " ") * seq_len(nrow(my_data))), 'timestamp']

my_data

        criteria           timestamp time_diff
1      some text 2015-07-30 15:53:15    0 secs
2 some more text 2015-07-30 15:53:47    0 secs
3                2015-07-30 15:54:48   61 secs
4                2015-07-30 15:55:48  121 secs
5      more text 2015-07-30 15:56:48    0 secs
6                2015-07-30 15:57:49   61 secs

Just as an extra note, `difftime` could be handy here too, with its `units = "mins"` argument — alexis_laz, May 03 '16 at 09:02

Conditionally calculate time differences between rows in R

1 Answers1

Linked