0

I want to replace distinct values in the 'Grade' column with NA if the values in the 'ID' column are duplicates.

This is my data frame currently:

ID            Name            Grade
1001          Mary            10
1002          John            9
1002          John            10
1003          James           12

And this is what I want the data frame to look like:

ID            Name            Grade
1001          Mary            10
1002          John            NA
1002          John            NA
1003          James           12

How would I go about accomplishing this?

Thanks!

mexica247
  • 11
  • 2

2 Answers2

1

You may try

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(Grade = ifelse(n()>1, NA, Grade))

     ID Name  Grade
  <int> <chr> <int>
1  1001 Mary     10
2  1002 John     NA
3  1002 John     NA
4  1003 James    12
Park
  • 14,771
  • 6
  • 10
  • 29
1

Here are couple of base R option -

  1. Using duplicated.
df$Grade[duplicated(df$ID) | duplicated(df$ID, fromLast = TRUE)] <- NA
df

#    ID  Name Grade
#1 1001  Mary    10
#2 1002  John    NA
#3 1002  John    NA
#4 1003 James    12
  1. Using table.
df$Grade[df$ID %in% names(Filter(function(x) x > 1, table(df$ID)))] <- NA

You can also use dplyr for 1.

library(dplyr)

df <- df %>% 
       mutate(Grade = replace(Grade, duplicated(ID) | 
                              duplicated(ID, fromLast = TRUE), NA))
df
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I'm getting the following error using the dyplr method: `Error: unexpected symbol in: "districtA <- districtA %>% mutate(gender = replace(gender, duplicated(Student Identifier"` – mexica247 Oct 15 '21 at 07:21
  • You seem to have not copied the code correctly as in my answer. – Ronak Shah Oct 15 '21 at 11:44