2

Ultimately, I am looking to create a table that contains each unique ID I have in my dataset with a corresponding field, based on a "ranking" of that field.

I've been very stuck on this step. I have considered looping for each unique ID, but wanted to see if there was an easier way. Perhaps the apply family of functions could be of help. I'm also not sure of a way to rank different string values (e.g., TRUE > FALSE > NA).

Below is a small sample of what I'm looking at, with just the two fields of interest:

df1 <- data.frame(ID = c(1,1,2,2,3,3,3,4,4,5,6,7,7), flag = c("NA", "TRUE", "NA", "FALSE", "TRUE", "TRUE", "FALSE", "NA", "NA", "NA", "TRUE", "FALSE", "FALSE"))

For each ID:

  • If there is at least one "TRUE" in the flag field, I want to pull one of those entire rows (doesn't matter which one).
  • If an ID doesn't contain a "TRUE" value in the flag field, but has a "FALSE" in at least one row, I want to pull one of those entire rows.
  • If an ID does not have a "TRUE" or "FALSE" value in the flag field, I still want one of the "NA" rows.

Below is the separate dataframe that I would ideally like to have:

ideal.df <- data.frame(ID = c(1,2,3,4,5,6,7), flag = c("TRUE", "FALSE", "TRUE", "NA", "NA", "TRUE", "FALSE"))

Thanks in advance for any help!

K.C.
  • 61
  • 7
  • Possible duplicate of https://stackoverflow.com/questions/13279582/select-the-first-row-by-group – akrun Jun 10 '19 at 16:20
  • 1
    I apologize if this is a duplicate question - I was having trouble conceptualizing the issue, so it very well could be. But the answer @Ronak-Shah provided is exactly what I needed and looks quite different from the answer to the question you've linked to. – K.C. Jun 10 '19 at 16:26
  • It's okay. It is one of those questions that we get the first row of each group. I thought it as a dupe. Normally, it would be and would be closed, Since, it is reopened due to a simple technicality, I undeleted my earlier posted answer with a modified one. – akrun Jun 10 '19 at 16:32
  • 1
    @akrun - Thanks for clarifying – K.C. Jun 10 '19 at 16:43
  • 2
    No problem. Please note that dupe tagging is not considered a bad thing. it makes it easier to find a similar post in the future. So, I am just doing a good samaritan service to SO and all the other people who search for a similar post in the future, – akrun Jun 10 '19 at 16:51

2 Answers2

3

An option would be to use slice with match

library(dplyr)
df1 %>% 
   group_by(ID) %>% 
   slice(which.max(match(flag, c("NA", "TRUE", "FALSE"))))
# A tibble: 7 x 2
# Groups:   ID [7]
#     ID flag 
#  <dbl> <fct>
#1     1 TRUE 
#2     2 FALSE
#3     3 FALSE
#4     4 NA   
#5     5 NA   
#6     6 TRUE 
#7     7 FALSE

Or using rank with which.min on the logical 'flag' column

df1 %>% 
   group_by(ID) %>% 
   slice(which.min(rank(as.logical(flag)) ))
# A tibble: 7 x 2
# Groups:   ID [7]
#     ID flag 
#  <dbl> <fct>
#1     1 TRUE 
#2     2 FALSE
#3     3 FALSE
#4     4 NA   
#5     5 NA   
#6     6 TRUE 
#7     7 FALSE
akrun
  • 874,273
  • 37
  • 540
  • 662
2

One option is to convert the flag variable to ordered factor and get the maximum value from each group.

library(dplyr)

df1 %>%
  mutate(flag = factor(flag, levels = c("NA", "TRUE", "FALSE"), ordered = TRUE)) %>%
  group_by(ID) %>%
  slice(which.max(flag))

#     ID flag 
#  <dbl> <ord>
#1     1 TRUE 
#2     2 FALSE
#3     3 FALSE
#4     4 NA   
#5     5 NA   
#6     6 TRUE 
#7     7 FALSE
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213