0

I am trying to make a code to remove rows conditional on the number of a categorical variable. For example, if I want to count the number that each label in a categorical has, the result using the table function is as follows. A: 500, B: 300, C: 90, D: 15, E: 200, F: 300

I would like to remove the rows with a value of that categorical variable with less than 100 observations. In this case, I should remove the rows with the categorical variable having C and D.

I can do this semi-manually by the process: 1. use the table function and check. 2. data[! data$categorical %in% c("C", "D",]

However, I think this is tedious if the categorical variable gets larger and more complex. Does anyone know how to do this in one step so that I can apply it to a larger dataset? I would really appreciate it if you teach me.

Take care

J L
  • 13
  • 4
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 19 '21 at 05:28
  • `tt <- table(data$categorical);data2 <- subset(data, categorical %in% names(tt[tt > 100]))` – Ronak Shah Mar 19 '21 at 06:39
  • MrFlick, I will keep that in my mind. – J L Mar 20 '21 at 08:17
  • Ronak, I really appreciate it, thx a lot!! – J L Mar 20 '21 at 08:17

0 Answers0