0

I have not found a clear answer to this question, so hopefully someone can put me in the right direction!

I have a nested data frame (panel data), with multiple observations within multiple individuals. I want to subset my data frame by those individuals (id) which have at least 20 rows of data.

I have tried the following:

subset1 = subset(df, table(df$id)[df$id] >= 20) 

However, I still find individuals with less that 20 rows of data.

Can anyone supply a solution?

Thanks in advance

user3237820
  • 211
  • 1
  • 8
  • Please read the info about how to give a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). – Jaap Oct 13 '15 at 16:38
  • Thanks, I will endeavour to do so in the future. – user3237820 Oct 13 '15 at 17:59

1 Answers1

1
subset1 = subset(df, as.logical(table(df$id)[df$id] >= 20)) 

Now, it should work.

The subset function actually is getting a series of true and false from the condition part, which indicates if the row should be kept or not/ meet the condition or not. Hence, the output of the condition part should be a series of true or false.

However, if you put table(df$id)[df$id]>=20 in the console, you will see it returns an array rather than logic. In this case, it is pretty straight that you just need to turn it into logic. Then, it works.

Kevin
  • 50
  • 7