1

Sample data

date1 = seq(as.Date("2019/01/01"), by = "month", length.out = 29)
date2= seq(as.Date("2019/01/01"), by = "month", length.out = 29)
date3 = seq(as.Date("2019/01/01"), by = "month", length.out = 29)
date4 = seq(as.Date("2019/01/01"), by = "month", length.out = 10)

subproducts1=rep("1",29)
subproducts2=rep("2",29)
subproductsx=rep("x",29)
subproductsy=rep("y",10)

b1 <- c(rnorm(29,5))
b2 <- c(rnorm(29,5))
b3 <-c(rnorm(29,5))
b4 <- c(rnorm(10,5))


dfone <- data.frame("date"= c(date1,date2,date3,date4),
                "subproduct"= 
                  c(subproducts1,subproducts2,subproductsx,subproductsy),
                "actuals"= c(b1,b2,b3,b4))

Question: How can I remove all sub products with observations 10 or less?

chriswang123456
  • 435
  • 2
  • 10

2 Answers2

4

We can do a group by 'subproduct' and filter those groups having number of observations (n()) greater than or equal to 10

library(dplyr)
dfone %>%
     group_by(subproduct) %>%
     filter(n() >= 10) %>%
     ungroup

Or without any package dependency

subset(dfone, subproduct %in% names(which(table(subproduct) >= 10)))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Using the plyr library's count() function, we can do this.

dfcheck <- plyr::count(dfone$subproduct)
dfcheck <- dfcheck[dfcheck$freq>10,]
dftwo <- dfone[dfone$subproduct %in% dfcheck$x,]

count() will give us a dataset where our variables appear under the x column and their frequency appears under freq. Using this, we can then subset for values over 10, and subset our original dataset for subproducts appearing in our >10 dfcheck dataset.