0

My data includes time stamps of activities completed by athletes over different periods. Each period goes for a maximum of twenty minutes. The corresponding weather conditions are noted.

I want to call out the first occurrence of weather and when conditions change. My question is similar to this question except I want the first occurrence and when a change occurs.

My data is structured as follows:

  df <- data.frame(Time=c("0:00:00","0:01:00","0:02:40","0:12:09",
                        "0:00:00", "0:02:07","0:07:19","0:15:16",
                        "0:00:00", "0:03:00","0:08:40","0:13:29",
                        "0:00:00", "0:02:10","0:08:47","0:17:55"),
                 Athlete = c('Paul', 'Paul', 'Paul', 'Paul',
                             'Paul', 'Paul', 'Paul','Paul',
                            'Joe', 'Joe', 'Joe', 'Joe',
                            'Joe', 'Joe', 'Joe', 'Joe'),
                 Period = c('P1', 'P1', 'P1', 'P1',
                            'P2', 'P2', 'P2', 'P2',
                            'P1', 'P1', 'P1', 'P1',
                            'P2', 'P2', 'P2', 'P2'),
                 Weather = c('Sunny', 'Sunny', 'Sunny', 'Cloudy',
                            'Rain', 'Cloudy', 'Rain', 'Rain',
                            'Rain', 'Sunny', 'Rain', 'Rain',
                            'Sunny', 'Sunny', 'Cloudy', 'Cloudy'))
  1. How do I call out the first occurrence and change in weather, according to each athlete and period?
  2. How do I have time in minutes.seconds? For example: 2.40

I have attempted the code below but this is not returning my anticipated output.

Test <- df[match(unique(df$Weather), df$Weather),]

My anticipated output would be:

Output <- data.frame(Time = c(0.0, 12.09, 
                              0.0, 2.07, 7.19, 
                              0.0, 3.00, 8.40, 
                              0.0, 8.47), 
                     Athlete = c('Paul', 'Paul', 
                                 'Paul', 'Paul', 'Paul',
                                 'Joe', 'Joe', 'Joe', 
                                 'Joe', 'Joe'), 
                     Period = c('P1', 'P1', 
                                'P2', 'P2', 'P2',
                                'P1', 'P1', 'P1', 
                                'P2', 'P2'), 
                     Weather = c('Sunny', 'Cloudy', 
                                 'Rain', 'Cloudy', 'Rain', 
                                 'Rain', 'Sunny', 'Rain', 
                                 'Sunny', 'Cloudy'))

From this question, I understand the index of a change in factors within a column can be located, how do I arrange this code to have my desired output?

Thank you.

Community
  • 1
  • 1
user2716568
  • 1,866
  • 3
  • 23
  • 38

2 Answers2

1

I'd suggest using something like rleid from "data.table". With that you can do:

library(data.table)
as.data.table(df)[, ind := sequence(.N), rleid(Athlete, Period, Weather)][ind == 1]
##        Time Athlete Period Weather ind
##  1: 0:00:00    Paul     P1   Sunny   1
##  2: 0:12:09    Paul     P1  Cloudy   1
##  3: 0:00:00    Paul     P2    Rain   1
##  4: 0:02:07    Paul     P2  Cloudy   1
##  5: 0:07:19    Paul     P2    Rain   1
##  6: 0:00:00     Joe     P1    Rain   1
##  7: 0:03:00     Joe     P1   Sunny   1
##  8: 0:08:40     Joe     P1    Rain   1
##  9: 0:00:00     Joe     P2   Sunny   1
## 10: 0:08:47     Joe     P2  Cloudy   1

Regarding your question about "How do I have time in minutes.seconds? For example: 2.40--that's not really a standard way of expressing time, is it? 0.4 * 60 = 24, so I wouldn't recommend converting it in that manner.

If you wanted those values as numeric values, perhaps you can split them out into separate columns. For that, I'd recommend my "splitstackshape" package.

library(splitstackshape)
cSplit(as.data.table(df)[
  , ind := sequence(.N), rleid(Athlete, Period, Weather)][ind == 1][
    , ind := NULL][], "Time", ":")
##     Athlete Period Weather Time_1 Time_2 Time_3
##  1:    Paul     P1   Sunny      0      0      0
##  2:    Paul     P1  Cloudy      0     12      9
##  3:    Paul     P2    Rain      0      0      0
##  4:    Paul     P2  Cloudy      0      2      7
##  5:    Paul     P2    Rain      0      7     19
##  6:     Joe     P1    Rain      0      0      0
##  7:     Joe     P1   Sunny      0      3      0
##  8:     Joe     P1    Rain      0      8     40
##  9:     Joe     P2   Sunny      0      0      0
## 10:     Joe     P2  Cloudy      0      8     47

"Time_1" would be hours, "Time_2" would be minutes, and "Time_3" would be seconds.

If you really wanted the "Time" column the way you described, you could do that with something like:

df$Time <- as.numeric(sub(":", ".", gsub("^[^:]*:", "", df$Time)))

And then, proceed with the "data.table" approach described earlier.


The "tidyverse" approach (still using rleid from "data.table") might look something like this:

library(tidyverse)
library(data.table)

df %>%
  mutate(ind = rleid(Athlete, Period, Weather)) %>%
  group_by(ind) %>%
  slice(1) %>%
  ungroup() %>%
  select(-ind) %>%
  mutate(Time = as.numeric(sub(":", ".", gsub("^[^:]*:", "", Time))))
# # A tibble: 10 × 4
#     Time Athlete Period Weather
#    <dbl>  <fctr> <fctr>  <fctr>
# 1   0.00    Paul     P1   Sunny
# 2  12.09    Paul     P1  Cloudy
# 3   0.00    Paul     P2    Rain
# 4   2.07    Paul     P2  Cloudy
# 5   7.19    Paul     P2    Rain
# 6   0.00     Joe     P1    Rain
# 7   3.00     Joe     P1   Sunny
# 8   8.40     Joe     P1    Rain
# 9   0.00     Joe     P2   Sunny
# 10  8.47     Joe     P2  Cloudy
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • If `Time` was in hours:minutes:seconds:milliseconds, for example, 00:12:09:01, how would I please use your `gsub` code to extract 12.09? – user2716568 Mar 19 '17 at 10:23
  • @user2716568, there would be several ways. Here's one: `x <- "00:12:09:01"; gsub("^.*:(.*:.*):.*$", "\\1", x)`. – A5C1D2H2I1M1N2O1R2T1 Mar 19 '17 at 10:36
  • Apologies for another question, but how would I extract use your code to extract the minutes.seconds from a column containing hours.minutes.seconds:milliseconds in a data.frame `df$Time` – user2716568 Mar 20 '17 at 04:43
  • @user2716568, Are you familiar with regular expressions? Regular expressions use parentheses to capture groups. In the comment above, I only captured one group (and returned it with `"\\1"`. You can use multiple groups of parentheses to capture multiple groups of data. – A5C1D2H2I1M1N2O1R2T1 Mar 20 '17 at 04:48
  • @user2716568, also, that's why I had suggested the `cSplit` approach to split into multiple columns in one go, and deal with the data from there.... – A5C1D2H2I1M1N2O1R2T1 Mar 20 '17 at 04:49
  • I am very sorry but I am not familiar with regular expressions. How would I adapt the following code: `df$Time <- as.numeric(sub(":", ".", gsub("^[^:]*:", "", df$Time)))` to now read hours:minutes:seconds:milliseconds? – user2716568 Mar 20 '17 at 05:14
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/138495/discussion-between-a5c1d2h2i1m1n2o1r2t1-and-user2716568). – A5C1D2H2I1M1N2O1R2T1 Mar 20 '17 at 05:24
1

The not super elegant version:

df %>% filter(c(1,diff(as.numeric(Athlete)))!=0 |
              c(1,diff(as.numeric(Weather)))!=0 |
              c(1,diff(as.numeric(Period)))!=0) 

Note that this requires the data to be sorted the right way.

RoyalTS
  • 9,545
  • 12
  • 60
  • 101