How to delete values in a column of the data which appear less than x times?

Question

I have a dataset described in the picture( https://i.stack.imgur.com/4Xadd.jpg)(sorry I'm new to this Forum) and I want to remove those rows that the value of their "Target.section" column appear less than 4 times which in this case would be "NN,HT,IO and BP". How Can I do this?

Many Thanks.

Are subsetting based on the count of one column or more than one? — lmo, Jul 07 '16 at 13:58
Just updated my question again. I explained it badly last time. — Sina PN, Jul 07 '16 at 15:26
Do either of the answers succeed in answering your question? If yes, it is a good idea to select the one you like the best. If neither answer the question, then You should probably put together a minimum example with your data and desired output. Here are some tips to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — lmo, Jul 07 '16 at 15:35

G. Grothendieck · Answer 1 · 2016-07-07T16:10:05.037

1

This removes rows with less than 3 occurrences in column x (so in this example it would remove the x=12 rows. No packages are used.

DF <- data.frame(x = c(1, 1, 1, 12, 12, 3, 3, 3, 3), y = 1:9) # test data

subset(DF, ave(seq_along(x), x, FUN = length) >= 3)

This would remove rows with less than 3 occurrences of both x and y (so in this example it would remove all rows):

subset(DF, ave(seq_along(x), x, y, FUN = length) >= 3)

Next time please provide test input and expected output in the question.

edited Jul 07 '16 at 16:10

answered Jul 07 '16 at 13:57

G. Grothendieck

254,981
17
203
341

I think you didn't get my question completely. That was my fault though. I updated my question again and uploaded a picture as a piece of the data set. – Sina PN Jul 07 '16 at 15:23
Please provide reproducible data in your post -- do not provide screen shots. If your data frame is DF then display the output of dput(DF) in your question. I have updated the answer so that it works with non-numeric data as well. – G. Grothendieck Jul 07 '16 at 16:10

score 1 · Answer 2 · answered Jul 07 '16 at 14:07

1

You can use dplyr, (Using @G.Grothendieck's data set)

library(dplyr)
DF %>% 
  group_by(x) %>% 
  filter(n() >= 3)

answered Jul 07 '16 at 14:07

Sotos

51,121
6
32
66

score 1 · Accepted Answer · answered Jul 07 '16 at 17:36

1

We can also use data.table

library(data.table)
setDT(DF)[, if(.N >= 3) .SD, by = x]

answered Jul 07 '16 at 17:36

akrun

874,273
37
540
662

How to delete values in a column of the data which appear less than x times?

3 Answers3