2

I have a dataframe (call it df) of accidents. Each accident has a # associated with it, a # for each person involved, and the type of accident. It looks something like this:

x               y                    z
accident #1   person A    accident type #1
accident #1   person A    accident type #2
accident #2   person A    accident type #1
accident #2   person B    accident type #2
accident #2   person B    accident type #3
accident #3   person C    accident type #1

In the above case, person A was involved in two accidents. In the first accident, there were two 'types' of accidents that person A was involved with. Person B was involved with person A, but was only involved in one accident, with two accident types. Person C was also involved in only one accident.

I want to collect the subset of people who have only been involved in one accident. However, I want to include all of their accident types. So using the above example, I would want this:

x               y                    z
accident #2   person #2    accident type #2
accident #2   person #2    accident type #3
accident #3   person #3    accident type #1

How might I do this in R?

Jaap
  • 81,064
  • 34
  • 182
  • 193

2 Answers2

3

You can do this with the dplyr package, using group_by, filter, and n_distinct:

library(dplyr)
df %>%
  group_by(y) %>%
  filter(n_distinct(x) == 1) %>%
  ungroup()
David Robinson
  • 77,383
  • 16
  • 167
  • 187
0

We can use data.table

library(data.table)
setcolorder(setDT(df)[, .SD[uniqueN(x)==1] , y], names(df))[]
#            x        y                z
#1: accident #2 person B accident type #2
#2: accident #2 person B accident type #3
#3: accident #3 person C accident type #1
akrun
  • 874,273
  • 37
  • 540
  • 662