0

I have a dataset described in the picture( https://i.stack.imgur.com/4Xadd.jpg)(sorry I'm new to this Forum) and I want to remove those rows that the value of their "Target.section" column appear less than 4 times which in this case would be "NN,HT,IO and BP". How Can I do this?

Many Thanks.

Sina PN
  • 131
  • 3
  • 10
  • Are subsetting based on the count of one column or more than one? – lmo Jul 07 '16 at 13:58
  • Just updated my question again. I explained it badly last time. – Sina PN Jul 07 '16 at 15:26
  • Do either of the answers succeed in answering your question? If yes, it is a good idea to select the one you like the best. If neither answer the question, then You should probably put together a minimum example with your data and desired output. Here are some tips to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – lmo Jul 07 '16 at 15:35

3 Answers3

1

This removes rows with less than 3 occurrences in column x (so in this example it would remove the x=12 rows. No packages are used.

DF <- data.frame(x = c(1, 1, 1, 12, 12, 3, 3, 3, 3), y = 1:9) # test data

subset(DF, ave(seq_along(x), x, FUN = length) >= 3)

This would remove rows with less than 3 occurrences of both x and y (so in this example it would remove all rows):

subset(DF, ave(seq_along(x), x, y, FUN = length) >= 3)

Next time please provide test input and expected output in the question.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • I think you didn't get my question completely. That was my fault though. I updated my question again and uploaded a picture as a piece of the data set. – Sina PN Jul 07 '16 at 15:23
  • Please provide reproducible data in your post -- do not provide screen shots. If your data frame is DF then display the output of dput(DF) in your question. I have updated the answer so that it works with non-numeric data as well. – G. Grothendieck Jul 07 '16 at 16:10
1

You can use dplyr, (Using @G.Grothendieck's data set)

library(dplyr)
DF %>% 
  group_by(x) %>% 
  filter(n() >= 3)
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

We can also use data.table

library(data.table)
setDT(DF)[, if(.N >= 3) .SD, by = x]
akrun
  • 874,273
  • 37
  • 540
  • 662