R - Identify and remove ONE instance of duplicate rows

Question

For context: this is a follow up to this query which I recently posted: R - Identify and remove duplicate rows based on two columns

I need to do something very similar to what I described in that post, but let me explain here in full.

I have some data that looks like this (in case it's relevant, there are MANY other columns with other data):

Course_ID   Text_ID
33          17
33          17
58          17
5           22
8           22
42          25
42          25
17          26
17          26
35          39
51          39

I need to identify any instances where there are two or more matching values for Course_ID AND Text_ID. For example, in the data above, the first two rows in both columns are identical (33 and 17). I need to remove just one of these duplicate lines wherever they occur.

The final data should look like this:

Course_ID   Text_ID
33          17
58          17
5           22
8           22
42          25
17          26
35          39
51          39

The solution offered in my previous post removed all instances of any duplicate rows.

Thanks in advance.

You can use `dplyr::distinct()` to return only unique rows. – MrFlick Jul 01 '21 at 15:33 — MrFlick, Jul 01 '21 at 15:33

score 1 · Accepted Answer · answered Jul 01 '21 at 15:32

subset(df, !duplicated(df[c('Course_ID', 'Text_ID')]))
   Course_ID Text_ID
1         33      17
3         58      17
4          5      22
5          8      22
6         42      25
8         17      26
10        35      39
11        51      39

or even

df[!duplicated(df[c('Course_ID', 'Text_ID')]), ]

If only 2 columns as shown, just do unique(df)

score 0 · Answer 2 · answered Jul 01 '21 at 15:34

Does this work:

library(dplyr)
df %>% group_by(Course_ID, Text_ID) %>% distinct()
# A tibble: 8 x 2
# Groups:   Course_ID, Text_ID [8]
  Course_ID Text_ID
      <dbl>   <dbl>
1        33      17
2        58      17
3         5      22
4         8      22
5        42      25
6        17      26
7        35      39
8        51      39

R - Identify and remove ONE instance of duplicate rows

2 Answers2