1

I'm trying to calculate the time difference between a row and a row that has a column that meets some criteria.

Reading in some data:

my_data <- data.frame(criteria = c("some text", "some more text", " ", " ", "more text", " "),
                  timestamp = as.POSIXct(c("2015-07-30 15:53:15", "2015-07-30 15:53:47", "2015-07-30 15:54:48", "2015-07-30 15:55:48", "2015-07-30 15:56:48", "2015-07-30 15:57:49")))

        criteria           timestamp
1      some text 2015-07-30 15:53:15
2 some more text 2015-07-30 15:53:47
3                2015-07-30 15:54:48
4                2015-07-30 15:55:48
5      more text 2015-07-30 15:56:48
6                2015-07-30 15:57:49

I want to get the time difference (in minutes) between every row and the last row that wasn't blank in the criteria column. Therefore, I want:

        criteria           timestamp time_diff
1      some text 2015-07-30 15:53:15         0
2 some more text 2015-07-30 15:53:47         0
3                2015-07-30 15:54:48         1
4                2015-07-30 15:55:48         2
5      more text 2015-07-30 15:56:48         0
6                2015-07-30 15:57:49         1

So far, I've built the code to recognize where the "0's" should be - I just need the code to fill in the time differences. Here's my code:

my_data$time_diff <- ifelse (my_data$criteria != "", # Here's our statement
  my_data$time_diff <- "0", # Here's what happens if statement is TRUE
  my_data$time_diff <- NEED CODE HERE # if statement FALSE
  )

I have a feeling that this job may be better performed by something that isn't an ifelse statement, but i'm relatively new to R.

I've found q's on here where individuals tried to get time differences between neighboring rows (e.g. here and here), but have yet to find someone trying to deal with this kind of situation.

The closest question I've found to mine is this one, but that data are different from mine in how the individual wants to process them (at least from my vantage point).

edit: capitalized title.

Community
  • 1
  • 1
  • 1
    It seems that for each "timestamp" you need the time difference from the "timestamp"s at `cummax((my_data$criteria != " ") * seq_len(nrow(my_data)))` repsectively? – alexis_laz May 02 '16 at 20:47
  • @alexis_laz, I think so. To clarify what you mean, i'm comparing each timestamp (e.g. "timestamp3") against the timestamp of the largest row number _above "timestamp3"_ where `my_data$criteria != " " ` . Is that reading correct? If so, then yes. – Christopher Friedman May 02 '16 at 21:24

1 Answers1

2

Completing the answer with alexis_laz's masterful expression:

my_data <- data.frame(criteria = c("some text", "some more text", " ", " ", "more text", " "),
                      timestamp = as.POSIXct(c("2015-07-30 15:53:15", "2015-07-30 15:53:47", "2015-07-30 15:54:48", "2015-07-30 15:55:48", "2015-07-30 15:56:48", "2015-07-30 15:57:49")))

my_data$time_diff <- 
  my_data$timestamp - 
  my_data[cummax((my_data$criteria != " ") * seq_len(nrow(my_data))), 'timestamp']

my_data

        criteria           timestamp time_diff
1      some text 2015-07-30 15:53:15    0 secs
2 some more text 2015-07-30 15:53:47    0 secs
3                2015-07-30 15:54:48   61 secs
4                2015-07-30 15:55:48  121 secs
5      more text 2015-07-30 15:56:48    0 secs
6                2015-07-30 15:57:49   61 secs
Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19