Remove rows where a level of a factor occurs only one time in a data.frame in R

Question

I have the following sample:

Id = c(1, 1, 2, 2, 2, 1, 4, 3, 3, 3)
long =  c("60.466681", "60.664116", "60.766690", "60.86879", "60.986569","60.466681", "60.664116", "60.766690", "60.86879", "60.986569"  )
data = data.frame(Id, long)

I would like to remove the lines where the level of the factor Id occurs only one time in the data.frame. For example here, I would remove the row with Id == 4 and keep the others.

I tried:

data$duplicated <- duplicated(data$Id)
subset(data, data$duplicated == "FALSE")

but this also removes the row when each factor occurs for the first time (i.e. the first rows with Id=1 or Id=2)

  Id      long duplicated
1  1 60.466681      FALSE
2  1 60.664116       TRUE
3  2 60.766690      FALSE
4  2  60.86879       TRUE
5  2 60.986569       TRUE
6  1 60.466681       TRUE

Is there an easy way to do this?

Try `gdata::duplicated2` – Vincent Guillemot May 10 '16 at 11:50 — Vincent Guillemot, May 10 '16 at 11:50

qjgods · Accepted Answer · 2016-05-10T12:00:17.340

3

library(plyr)
data2<-ddply(data,.(Id),function(x){
  if(nrow(x)==1){
    return(NULL)
    }
  else{
    return(x)
  }
})

> data2
  Id      long
1  1 60.466681
2  1 60.664116
3  1 60.466681
4  2 60.766690
5  2  60.86879
6  2 60.986569
7  3 60.766690
8  3  60.86879
9  3 60.986569

edited May 10 '16 at 12:00

answered May 10 '16 at 11:51

qjgods

958
7
7

That would do the trick, thanks! – Floni May 10 '16 at 12:07
unfortunately, it does not work with big files (15 millions rows), there is a ram issue that I usually don´t have! – Floni Sep 12 '16 at 12:11

Remove rows where a level of a factor occurs only one time in a data.frame in R

1 Answers1