Group by followed by select only rows if its value in a particular column is less than its value from the same column

Question

I am new to R

I have a data frame [1390 *6], where the last variable is the rank.

[Example of the Dataset] enter image description here

So I would like to group_by by the "ID",then ignore the rows for the particular "ID" whose rank is higher than that of "15001"-highlighted in yellow colour.

This is what I have tried so far:

SS3<-SS1 %>% group_by(ID) %>% filter(any(DC== 15001) & any(SS1$rank <SS1$rank[DC== 15001]))

[Expected result]

enter image description here

Welcome to SO. please have a look at [how to ask a great reproducible R question](https://stackoverflow.com/q/5963269/3250126). Share your data (**not** as pictures but as clear text using `dput()`) and code you have already tried. — loki, Jul 24 '17 at 12:09
What do you mean by group_by ? because in your expected result i just see that you ordered it ? — Orhan Yazar, Jul 24 '17 at 12:11
For a given "ID", I have many entries. So I would like ignore the rows for the particular "ID" whose rank is higher than that of "15001"-highlighted in yellow color. — Swaminathan Sekar, Jul 24 '17 at 12:13
SS3<-SS1 %>% group_by(ID) %>% filter(any(`DC`== 15001) & any(SS1$rank — Swaminathan Sekar, Jul 24 '17 at 12:14
Please edit code and data to the question instead of commenting. — loki, Jul 24 '17 at 12:16

CPak · Answer 1 · 2017-07-24T15:03:16.833

Example that's similar to the data you provide, with only the relevant rows required for your operation. This should work with your own data (given what you've shown):

set.seed(1)
df <- data.frame(ID=c(rep(2122051,20),rep(2122052,20)),
                 DC=as.integer(runif(40)*100),
                rank=rep(1:20,2),
                 stringsAsFactors=F)
df$DC[c(10,30)] <- as.integer(15001)

I store the rank-1 of each position where DC==15001 as a vector

positions <- df$rank[df$DC==15001]
[1] 9 9

I use tidyverse map2 to store the entries that have rank less than those indicated in positions for each group.

library(tidyverse) 
df1 <- df %>%
          group_by(ID) %>%
          nest() %>%
          mutate(data = map2(data, 1:length(unique(df$ID)), ~head(.x,positions[.y]))) %>%
          unnest(data)

Output

        ID    DC  rank
 1 2122051    26     1
 2 2122051    37     2
 3 2122051    57     3
 4 2122051    90     4
 5 2122051    20     5
 6 2122051    89     6
 7 2122051    94     7
 8 2122051    66     8
 9 2122051    62     9
10 2122051 15001    10
11 2122052    93     1
12 2122052    21     2
13 2122052    65     3
14 2122052    12     4
15 2122052    26     5
16 2122052    38     6
17 2122052     1     7
18 2122052    38     8
19 2122052    86     9
20 2122052 15001    10

The rank has been generated based on the repair date for the particular ID. @ Chi Pak — Swaminathan Sekar, Jul 24 '17 at 18:23

Group by followed by select only rows if its value in a particular column is less than its value from the same column

1 Answers1