Extract rows for change in variable, according to condition

Question

My data includes time stamps of activities completed by athletes over different periods. Each period goes for a maximum of twenty minutes. The corresponding weather conditions are noted.

I want to call out the first occurrence of weather and when conditions change. My question is similar to this question except I want the first occurrence and when a change occurs.

My data is structured as follows:

  df <- data.frame(Time=c("0:00:00","0:01:00","0:02:40","0:12:09",
                        "0:00:00", "0:02:07","0:07:19","0:15:16",
                        "0:00:00", "0:03:00","0:08:40","0:13:29",
                        "0:00:00", "0:02:10","0:08:47","0:17:55"),
                 Athlete = c('Paul', 'Paul', 'Paul', 'Paul',
                             'Paul', 'Paul', 'Paul','Paul',
                            'Joe', 'Joe', 'Joe', 'Joe',
                            'Joe', 'Joe', 'Joe', 'Joe'),
                 Period = c('P1', 'P1', 'P1', 'P1',
                            'P2', 'P2', 'P2', 'P2',
                            'P1', 'P1', 'P1', 'P1',
                            'P2', 'P2', 'P2', 'P2'),
                 Weather = c('Sunny', 'Sunny', 'Sunny', 'Cloudy',
                            'Rain', 'Cloudy', 'Rain', 'Rain',
                            'Rain', 'Sunny', 'Rain', 'Rain',
                            'Sunny', 'Sunny', 'Cloudy', 'Cloudy'))

How do I call out the first occurrence and change in weather, according to each athlete and period?
How do I have time in minutes.seconds? For example: 2.40

I have attempted the code below but this is not returning my anticipated output.

Test <- df[match(unique(df$Weather), df$Weather),]

My anticipated output would be:

Output <- data.frame(Time = c(0.0, 12.09, 
                              0.0, 2.07, 7.19, 
                              0.0, 3.00, 8.40, 
                              0.0, 8.47), 
                     Athlete = c('Paul', 'Paul', 
                                 'Paul', 'Paul', 'Paul',
                                 'Joe', 'Joe', 'Joe', 
                                 'Joe', 'Joe'), 
                     Period = c('P1', 'P1', 
                                'P2', 'P2', 'P2',
                                'P1', 'P1', 'P1', 
                                'P2', 'P2'), 
                     Weather = c('Sunny', 'Cloudy', 
                                 'Rain', 'Cloudy', 'Rain', 
                                 'Rain', 'Sunny', 'Rain', 
                                 'Sunny', 'Cloudy'))

From this question, I understand the index of a change in factors within a column can be located, how do I arrange this code to have my desired output?

Thank you.

`as.difftime(as.character(df$Time), format="%H:%M:%S")` if you are happy to not use lubridate. — thelatemail, Mar 15 '17 at 04:34

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2017-03-17T06:05:32.877

I'd suggest using something like rleid from "data.table". With that you can do:

library(data.table)
as.data.table(df)[, ind := sequence(.N), rleid(Athlete, Period, Weather)][ind == 1]
##        Time Athlete Period Weather ind
##  1: 0:00:00    Paul     P1   Sunny   1
##  2: 0:12:09    Paul     P1  Cloudy   1
##  3: 0:00:00    Paul     P2    Rain   1
##  4: 0:02:07    Paul     P2  Cloudy   1
##  5: 0:07:19    Paul     P2    Rain   1
##  6: 0:00:00     Joe     P1    Rain   1
##  7: 0:03:00     Joe     P1   Sunny   1
##  8: 0:08:40     Joe     P1    Rain   1
##  9: 0:00:00     Joe     P2   Sunny   1
## 10: 0:08:47     Joe     P2  Cloudy   1

Regarding your question about "How do I have time in minutes.seconds? For example: 2.40--that's not really a standard way of expressing time, is it? 0.4 * 60 = 24, so I wouldn't recommend converting it in that manner.

If you wanted those values as numeric values, perhaps you can split them out into separate columns. For that, I'd recommend my "splitstackshape" package.

library(splitstackshape)
cSplit(as.data.table(df)[
  , ind := sequence(.N), rleid(Athlete, Period, Weather)][ind == 1][
    , ind := NULL][], "Time", ":")
##     Athlete Period Weather Time_1 Time_2 Time_3
##  1:    Paul     P1   Sunny      0      0      0
##  2:    Paul     P1  Cloudy      0     12      9
##  3:    Paul     P2    Rain      0      0      0
##  4:    Paul     P2  Cloudy      0      2      7
##  5:    Paul     P2    Rain      0      7     19
##  6:     Joe     P1    Rain      0      0      0
##  7:     Joe     P1   Sunny      0      3      0
##  8:     Joe     P1    Rain      0      8     40
##  9:     Joe     P2   Sunny      0      0      0
## 10:     Joe     P2  Cloudy      0      8     47

"Time_1" would be hours, "Time_2" would be minutes, and "Time_3" would be seconds.

If you really wanted the "Time" column the way you described, you could do that with something like:

df$Time <- as.numeric(sub(":", ".", gsub("^[^:]*:", "", df$Time)))

And then, proceed with the "data.table" approach described earlier.

The "tidyverse" approach (still using rleid from "data.table") might look something like this:

library(tidyverse)
library(data.table)

df %>%
  mutate(ind = rleid(Athlete, Period, Weather)) %>%
  group_by(ind) %>%
  slice(1) %>%
  ungroup() %>%
  select(-ind) %>%
  mutate(Time = as.numeric(sub(":", ".", gsub("^[^:]*:", "", Time))))
# # A tibble: 10 × 4
#     Time Athlete Period Weather
#    <dbl>  <fctr> <fctr>  <fctr>
# 1   0.00    Paul     P1   Sunny
# 2  12.09    Paul     P1  Cloudy
# 3   0.00    Paul     P2    Rain
# 4   2.07    Paul     P2  Cloudy
# 5   7.19    Paul     P2    Rain
# 6   0.00     Joe     P1    Rain
# 7   3.00     Joe     P1   Sunny
# 8   8.40     Joe     P1    Rain
# 9   0.00     Joe     P2   Sunny
# 10  8.47     Joe     P2  Cloudy

If `Time` was in hours:minutes:seconds:milliseconds, for example, 00:12:09:01, how would I please use your `gsub` code to extract 12.09? — user2716568, Mar 19 '17 at 10:23
@user2716568, there would be several ways. Here's one: `x <- "00:12:09:01"; gsub("^.*:(.*:.*):.*$", "\\1", x)`. — A5C1D2H2I1M1N2O1R2T1, Mar 19 '17 at 10:36
Apologies for another question, but how would I extract use your code to extract the minutes.seconds from a column containing hours.minutes.seconds:milliseconds in a data.frame `df$Time` — user2716568, Mar 20 '17 at 04:43
@user2716568, Are you familiar with regular expressions? Regular expressions use parentheses to capture groups. In the comment above, I only captured one group (and returned it with `"\\1"`. You can use multiple groups of parentheses to capture multiple groups of data. — A5C1D2H2I1M1N2O1R2T1, Mar 20 '17 at 04:48
@user2716568, also, that's why I had suggested the `cSplit` approach to split into multiple columns in one go, and deal with the data from there.... — A5C1D2H2I1M1N2O1R2T1, Mar 20 '17 at 04:49
I am very sorry but I am not familiar with regular expressions. How would I adapt the following code: `df$Time <- as.numeric(sub(":", ".", gsub("^[^:]*:", "", df$Time)))` to now read hours:minutes:seconds:milliseconds? — user2716568, Mar 20 '17 at 05:14
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/138495/discussion-between-a5c1d2h2i1m1n2o1r2t1-and-user2716568). — A5C1D2H2I1M1N2O1R2T1, Mar 20 '17 at 05:24

RoyalTS · Answer 2 · 2017-03-17T16:37:25.740

1

The not super elegant version:

df %>% filter(c(1,diff(as.numeric(Athlete)))!=0 |
              c(1,diff(as.numeric(Weather)))!=0 |
              c(1,diff(as.numeric(Period)))!=0)

Note that this requires the data to be sorted the right way.

edited Mar 17 '17 at 16:37

answered Mar 17 '17 at 15:44

RoyalTS

9,545
12
60
101

Extract rows for change in variable, according to condition

2 Answers2