Filtering rows which satisfy a condition per group in r

Question

I have a big dataframe and I want to remove all rows if the number of rows for a given group based on a column in this datafram is less than a given number. Here is an example:

x=1:6; y=c("A","B","B","B","C","C")
df<- data.frame(x,y)

If I group by variable y, I have three rows that belong to group "B". Here I want to remove all rows that don't satisfy this condition (<3 rows). Expected output:

df
  x y
1 2 B
2 3 B
3 4 B

Is there an easy way to do this?

score 7 · Answer 1 · answered Dec 13 '19 at 16:19

7

We can use dplyr::filter() and count the number of row in each group using dplyr::n()

library(dplyr)

df %>% 
  group_by(y) %>% 
  filter(n()>2)

answered Dec 13 '19 at 16:19

M--

25,431
8
61
93

akrun · Answer 2 · 2019-12-13T16:27:54.463

4

Another option is

library(data.table)
setDT(df)[, .SD[.N >2], by  = y]

edited Dec 13 '19 at 16:27

answered Dec 13 '19 at 16:20

akrun

874,273
37
540
662

score 3 · Answer 3 · answered Dec 13 '19 at 16:25

3

Using base R

t <- table(df$y)
df[df$y %in% names(t[t > 2]), ]

  x y
2 2 B
3 3 B
4 4 B

answered Dec 13 '19 at 16:25

manotheshark

4,297
17
30

ulfelder · Answer 4 · 2019-12-14T17:46:58.617

2

Here's a base R solution using the split, apply, combine approach:

do.call(rbind, lapply(split(df, df$y), function(i) if(nrow(i) >= 3) { i }))

edited Dec 14 '19 at 17:46

answered Dec 13 '19 at 16:30

ulfelder

5,305
1
22
40

score 2 · Answer 5 · answered Dec 14 '19 at 18:37

2

Here is a base R solution which used ave()

res <-df[ave(seq(nrow(df)),df$y,FUN = length)>=3,]

and you will get

> res
  x y
2 2 B
3 3 B
4 4 B

answered Dec 14 '19 at 18:37

ThomasIsCoding

96,636
9
24
81

Filtering rows which satisfy a condition per group in r

5 Answers5