0

From the data frame below, I need get unique records, particularly, creation date, where the columns are p01, p02, p03 are equal to 100. So, in the resultant data frame I will get rows with ids (3,6,11,14,19,24,25,26,28 ...)

Id   CREATION_DATE            p01   p02   p31    combined_field
2   2016-06-01 18:38:20.081   100     0    NA      p01_100
3   2016-06-01 19:25:54.733   100     0    NA      p01_100
6   2016-06-02 02:10:01.696     0   100    NA      p02_100
7   2016-06-02 02:36:05.958   100     0    NA      p01_100
8   2016-06-02 02:36:35.263   100     0    NA      p01_100
11  2016-06-02 16:14:48.259   100     0    NA      p01_100
14  2016-06-02 21:30:46.163     0   100    NA      p02_100
15  2016-06-02 22:45:30.451   100     0    NA      p01_100
19  2016-06-04 04:19:51.653   100     0    NA      p01_100
24  2016-06-06 00:04:39.383     0   100    NA      p02_100
25  2016-06-06 09:02:14.595   100     0    NA      p01_100
26  2016-06-06 09:43:00.552     0   100    NA      p02_100
28  2016-06-07 09:06:43.859     0   100    NA      p02_100
31  2016-06-07 21:16:00.166   100     0    NA      p01_100
33   2016-06-08 10:47:14.24   100     0    NA      p01_100
35  2016-06-09 09:40:26.429   100     0    NA      p01_100
39  2016-06-09 23:08:28.582     0   100    NA      p02_100
41  2016-06-13 05:31:34.209     0   100    NA      p02_100
44  2016-06-13 21:38:16.356   100     0    NA      p01_100
...

I tried to combine p01, p02, p03 into combined field so that i can group them.

required.data <- within(required.data, { combined_field <- ifelse(p01 == 100, paste("p01_100"), ifelse(p02 == 100, paste("p02_100"), ifelse(lvs31 == 100, paste("p31_100"),""))) })

I am unable to think of slicing the data frame and pick the latest creation date in each of p01, p02, p03 where it has the last 100 (or before it becomes 0 in the next row).

Expected output:


Id   CREATION_DATE            p01   p02   p31    combined_field
3   2016-06-01 19:25:54.733   100     0    NA      p01_100
6   2016-06-02 02:10:01.696     0   100    NA      p02_100
11  2016-06-02 16:14:48.259   100     0    NA      p01_100
14  2016-06-02 21:30:46.163     0   100    NA      p02_100
19  2016-06-04 04:19:51.653   100     0    NA      p01_100
24  2016-06-06 00:04:39.383     0   100    NA      p02_100
25  2016-06-06 09:02:14.595   100     0    NA      p01_100
28  2016-06-07 09:06:43.859     0   100    NA      p02_100
...

I tried to get the desired output without computing combined_field but ran out of Ideas as R is new language to wrap my head around its syntax.

parishodak
  • 4,506
  • 4
  • 34
  • 48

2 Answers2

2

Using dplyr

library(dplyr)
df %>% # Your data frame as in the post
  group_by(combined_field) %>%
  filter(CREATION_DATE == max(CREATION_DATE)) 
  # Assuming that your date-time variables belongs appropriate time class
Julius
  • 277
  • 2
  • 9
2

We can use slice

library(dplyr)
df %>%
   group_by(combined_field) %>%
   slice(which.max(CREATION_DATE))

Update

Based on OP's comments, may be this helps

library(data.table)
setDT(df)[df[, .I[which.max(CREATION_DATE)],rleid(p01, p02)]$V1]
#    Id       CREATION_DATE p01 p02 p31 combined_field
#1:  3 2016-06-01 19:25:54 100   0  NA        p01_100
#2:  6 2016-06-02 02:10:01   0 100  NA        p02_100
#3: 11 2016-06-02 16:14:48 100   0  NA        p01_100
#4: 14 2016-06-02 21:30:46   0 100  NA        p02_100
#5: 19 2016-06-04 04:19:51 100   0  NA        p01_100
#6: 24 2016-06-06 00:04:39   0 100  NA        p02_100
#7: 25 2016-06-06 09:02:14 100   0  NA        p01_100
#8: 28 2016-06-07 09:06:43   0 100  NA        p02_100
#9: 35 2016-06-09 09:40:26 100   0  NA        p01_100
#10:41 2016-06-13 05:31:34   0 100  NA        p02_100
#11:44 2016-06-13 21:38:16 100   0  NA        p01_100
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • this has given only the max of creation date for 3 variables. but I am trying to get maximum creation date before the 100 value changes to different variable (p01, p02, p31) which tricky to achieve i felt. combined_field is a computed value, but it is not really required, but a intermediate step i am trying to solve the problem. – parishodak Feb 03 '17 at 20:41
  • @parishodak Looks like your post was dupe linked incorrrectly. – akrun Feb 04 '17 at 03:12