1

In R,I have data called main, with a column called Pass.id which is an identifier for a particular event. The values within this column are either unique, or there are pairs

Row  Pass.Id
1      300
2      300
3      301
4      302
5      302
6      303

So I would be looking to extract rows 1,2,4,5 into a new dataframe

I have spent much time on this but cant work it out. Any help appreciated.

Vasan
  • 4,810
  • 4
  • 20
  • 39
A Student
  • 25
  • 4

1 Answers1

0

This is more complicated than it might appear on its face. My solution is a little messy, but should work, though you may have to adapt it since I don't have a dput of your data. I'll explain each piece, so hopefully that will help.

The first step is to find those index values with a duplicate.

duplicated(main$Pass.Id)
# This is a logical vector with TRUE for the second or later occurrence of some value

Because the above logical vector is TRUE only starting with the second occurence, you need to find the actual values:

main$Pass.Id[duplicated(main$Pass.Id)]
# This is a vector of the same class of x that only contains those values that occur more than once

Then you find those elements in the row that are in the above vector and extract them.

main$Pass.Id %in% main$Pass.Id[duplicated(main$Pass.Id)]
# This is a logical vector that is TRUE for each occurrence of any value that is in x more than one time.
# This logical vector is different from the first one, because it includes the first occurence of a duplicate value (not just the second and any later occurrences)

There are a number of ways to extract the columns for which a logical vector is TRUE. In base R, you'd do:

main[main$Pass.Id %in% main$Pass.Id[duplicated(main$Pass.Id)], ]
# Don't forget the comma, which says you're extracting rows.

With dplyr, you could do:

filter(main, Pass.Id %in% Pass.Id[duplicated(Pass.Id)])
De Novo
  • 7,120
  • 1
  • 23
  • 39