-1

I have a dataframe like this:

Person Test  
1 new  
1 new  
1 old  
1 old  
2 new  
2 new  
2 old

and I want to get rid of the rows with unequal numbers of test on the new system and the old system. In this case, person 2 is tested 2 times on new and 1 time on old, so I want to get rid of all his data (the last three rows.) How do I achieve this with a large dataset?

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
leslie_r
  • 13
  • 3
  • Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Jun 13 '20 at 06:14

2 Answers2

1

Here is a base R solution using ave and table.

i <- with(df1, ave(Test, Person, FUN = function(x){
  all(table(x) == length(x) %/% 2)
}))
df1[as.logical(i), ]
#  Person Test
#1      1  new
#2      1  new
#3      1  old
#4      1  old
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

You can count the frequency of each unique value for each person with table and select the groups where the count is the same for all unique values.

This can be done in base R :

subset(df, ave(Test, Person, FUN = function(x) length(unique(table(x)))) == 1)

#  Person Test
#1      1  new
#2      1  new
#3      1  old
#4      1  old

dplyr

library(dplyr)
df %>% group_by(Person) %>% filter(n_distinct(table(Test)) == 1)

and data.table :

library(data.table)
setDT(df)[,.SD[uniqueN(table(Test)) == 1], Person]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213