How to use condition to get rid of rows in R

Question

I have a dataframe like this:

Person Test  
1 new  
1 new  
1 old  
1 old  
2 new  
2 new  
2 old

and I want to get rid of the rows with unequal numbers of test on the new system and the old system. In this case, person 2 is tested 2 times on new and 1 time on old, so I want to get rid of all his data (the last three rows.) How do I achieve this with a large dataset?

Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) — Tung, Jun 13 '20 at 06:14

score 1 · Answer 1 · answered Jun 13 '20 at 06:31

1

Here is a base R solution using ave and table.

i <- with(df1, ave(Test, Person, FUN = function(x){
  all(table(x) == length(x) %/% 2)
}))
df1[as.logical(i), ]
#  Person Test
#1      1  new
#2      1  new
#3      1  old
#4      1  old

answered Jun 13 '20 at 06:31

Rui Barradas

70,273
8
34
66

score 1 · Accepted Answer · answered Jun 13 '20 at 06:45

You can count the frequency of each unique value for each person with table and select the groups where the count is the same for all unique values.

This can be done in base R :

subset(df, ave(Test, Person, FUN = function(x) length(unique(table(x)))) == 1)

#  Person Test
#1      1  new
#2      1  new
#3      1  old
#4      1  old

dplyr

library(dplyr)
df %>% group_by(Person) %>% filter(n_distinct(table(Test)) == 1)

and data.table :

library(data.table)
setDT(df)[,.SD[uniqueN(table(Test)) == 1], Person]

How to use condition to get rid of rows in R

2 Answers2