3

I have a big dataframe and I want to remove all rows if the number of rows for a given group based on a column in this datafram is less than a given number. Here is an example:

x=1:6; y=c("A","B","B","B","C","C")
df<- data.frame(x,y)

If I group by variable y, I have three rows that belong to group "B". Here I want to remove all rows that don't satisfy this condition (<3 rows). Expected output:

df
  x y
1 2 B
2 3 B
3 4 B

Is there an easy way to do this?

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
Kaizen
  • 131
  • 1
  • 11

5 Answers5

7

We can use dplyr::filter() and count the number of row in each group using dplyr::n()

library(dplyr)

df %>% 
  group_by(y) %>% 
  filter(n()>2)
M--
  • 25,431
  • 8
  • 61
  • 93
4

Another option is

library(data.table)
setDT(df)[, .SD[.N >2], by  = y]
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Using base R

t <- table(df$y)
df[df$y %in% names(t[t > 2]), ]

  x y
2 2 B
3 3 B
4 4 B
manotheshark
  • 4,297
  • 17
  • 30
2

Here's a base R solution using the split, apply, combine approach:

do.call(rbind, lapply(split(df, df$y), function(i) if(nrow(i) >= 3) { i }))
ulfelder
  • 5,305
  • 1
  • 22
  • 40
2

Here is a base R solution which used ave()

res <-df[ave(seq(nrow(df)),df$y,FUN = length)>=3,]

and you will get

> res
  x y
2 2 B
3 3 B
4 4 B
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81