-2

I have the repeated data. There are two waves. I want to select the people who did a test twice, so their ID are repeated twice. Some people only did once, and I want to exclude them. My data are a long structure format.In the data structure, there is a variable called" wave", which is either labeled "1" or"2". So, my problem is I want to take a ID with wave 1 and wave 2. Here is my data :

id<-c(1, 2, 3,4,5,6,1,2,4)
wave<-c(1,1,2,1,2,2,2,2,2)
df<-cbind(id,wave)

so ID with 1,2,4 have two waves and I want to take them out. Any idea?

989
  • 12,579
  • 5
  • 31
  • 53
  • 2
    Could you please post a little bit of your data, preferably a snippet that exemplifies the issue at hand? – erasmortg Sep 23 '15 at 14:32
  • Here's some info on creating a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Heroka Sep 23 '15 at 14:47

1 Answers1

1

The comments are right on: you should provide your data, an example of what you've tried that demonstrates the problem, and preferably an example of the desired output. Please do that in the future.

Here's an example that hopefully simulates your situation:

set.seed(1)    # for reproducible example
df <- data.frame(ID=c(1:5,1:3),
                 wave=c(rep(1,5),rep(2,3)),
                 x=rnorm(8))
df
#   ID wave          x
# 1  1    1 -0.6264538
# 2  2    1  0.1836433
# 3  3    1 -0.8356286
# 4  4    1  1.5952808
# 5  5    1  0.3295078
# 6  1    2 -0.8204684
# 7  2    2  0.4874291
# 8  3    2  0.7383247

Here's a solution using aggregate(...) in base R.

# base R solution
IDS <- aggregate(wave~ID,df, function(x)length(x)>1)
df[df$ID %in% IDS[IDS$wave,]$ID,]
#   ID wave          x
# 1  1    1 -0.6264538
# 2  2    1  0.1836433
# 3  3    1 -0.8356286
# 6  1    2 -0.8204684
# 7  2    2  0.4874291
# 8  3    2  0.7383247

Here's a solution using data.table.

# data.table solution
library(data.table)
setDT(df)[,lapply(.SD,function(x)x[.N>1]),by=ID]
#    ID wave          x
# 1:  1    1 -0.6264538
# 2:  1    2 -0.8204684
# 3:  2    1  0.1836433
# 4:  2    2  0.4874291
# 5:  3    1 -0.8356286
# 6:  3    2  0.7383247

And a simpler data.table solution (courtesy of @Arun).

setDT(df)[, if (.N > 1L) .SD, by=ID]

All of these select any rows having more than 1 (not exactly 2) waves for a given ID.

jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • Or `setDT(df)[, if (.N > 1L) .SD, by=ID]` – Arun Sep 23 '15 at 16:54
  • Which version of data.table? – jlhoward Sep 23 '15 at 17:08
  • Should work on any version. Which one are you on and what is the issue? – Arun Sep 23 '15 at 17:12
  • I'm on 1.9.6 and it does work, but I see a lot of posts that use features in developmental versions without explaining how to install the developmental version. I wanted to be sure this wasn't an example of that. I'll add to my answer. – jlhoward Sep 23 '15 at 17:20
  • That's a fair question. I try to avoid editing other people's answers; tends to get tense. Sometimes I suggest that respondent edit the question, which also gets tense. – jlhoward Sep 23 '15 at 17:33
  • @Arun BTW [this question](http://stackoverflow.com/questions/32723658/how-to-set-some-values-within-each-column-to-zero-based-on-the-number-of-thier-r/32724973#32724973) is begging for an efficient data.table solution (177,000 rows, 10,000 columns), but I couldn't come up with one. In case you have the time/inclination... – jlhoward Sep 23 '15 at 17:36
  • Okay, will take a look later today. – Arun Sep 23 '15 at 17:42