2

I have a repeated measures data set. I need to remove all Participants where the number of observations for that individual is less than 3. What is the best way to do this?

x <- c(9, 9, 9, 11, 11, 23, 23, 23, 23, 45, 45, 45, 56, 56)

Here 11 and 56 need to be removed from the data. So far I have created a data frame with all the obs that I want to keep but not sure how to manipulate my data set using the new data frame

x <- as.data.frame(table(x))
x1 <- x[x$Freq > 2,]
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
jusjosgra
  • 375
  • 2
  • 4
  • 14
  • my data set is rather large (1000's obs) so this is taking rather a long time to run. Perhaps there is an alternative using a for loop or something? – user1033745 45 mins ago > raw.data1 <- raw.data[ave(raw.data$REGISTRA,raw.data,FUN=length) > 2] Error: memory exhausted (limit reached?) In addition: – jusjosgra Nov 07 '11 at 15:14

2 Answers2

4

One more for the ave() function :

x[ave(x,x,FUN=length) > 2]

In an answer to your comment, you should perform it like this :

raw.data1 <- raw.data[ave(raw.data$REGISTRA,raw.data$REGISTRA,FUN=length) > 2]

Also read the help page of ave, that will help you understand what the code is doing exactly.

Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • my data set is rather large (1000's obs) so this is taking rather a long time to run. Perhaps there is an alternative using a for loop or something? – jusjosgra Nov 07 '11 at 14:28
  • > raw.data1 <- raw.data[ave(raw.data$REGISTRA,raw.data,FUN=length) > 2] Error: memory exhausted (limit reached?) In addition: > – jusjosgra Nov 07 '11 at 14:29
  • A for loop isn't exactly an alternative for speed. Applying ave correctly would help too ;) – Joris Meys Nov 07 '11 at 15:19
  • This is how I did it at first and it throws out warnings. > raw.data1 <- raw.data[ave(raw.data$REGISTRA,raw.data$REGISTRA,FUN=length) > 2] Error in `[.data.frame`(raw.data, ave(raw.data$REGISTRA, raw.data$REGISTRA, : undefined columns selected In addition: There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In `[<-.factor`(`*tmp*`, i, value = 6L) : invalid factor level, NAs generated 2: In `[<-.factor`(`*tmp*`, i, value = 6L) : invalid factor level, NAs generated 3 – jusjosgra Nov 07 '11 at 15:27
  • ahhh think it needs a comma before the ] – jusjosgra Nov 07 '11 at 15:29
  • It is still throwing warnings... oh dear – jusjosgra Nov 07 '11 at 15:32
  • 49: In `[<-.factor`(`*tmp*`, i, value = 3L) : invalid factor level, NAs generated 50: In `[<-.factor`(`*tmp*`, i, value = 4L) : invalid factor level, NAs generated > – jusjosgra Nov 07 '11 at 15:33
  • yeah, if you're running this on a factor, that might give trouble. Try `ave(as.numeric(raw.data$REGISTRA),raw.data$REGISTRA,FUN=Length) > 2` – Joris Meys Nov 07 '11 at 16:14
  • That doesnt seem to be working either: Error in match.fun(FUN) : object 'Length' not found I guess because fun = Length needs to apply to a factor? hmmm – jusjosgra Nov 07 '11 at 16:40
  • FUN=length, small l. it's because Length doesn't exist as a function. – Joris Meys Nov 07 '11 at 17:16
  • corrected this and its still not giving me a useful data frame. It duplicates a load of variables but doesnt provide the frequency data. I think I will just use a for loop to iterate through and count each unique row for each unique id and add that to a column. Thanks for the help though, not sure why its not working. – jusjosgra Nov 08 '11 at 10:22
4
x[x %in% names(table(x)[table(x) >=3])]
Manuel Ramón
  • 2,490
  • 2
  • 18
  • 23
  • This would also have answered http://stackoverflow.com/questions/8023315/remove-factors-with-criteria, of course you would needed to have travel forward in time to know that it was going to be posted. – IRTFM Nov 07 '11 at 16:35
  • Less trouble with data conversion when working with factors. Nice – Joris Meys Nov 07 '11 at 17:54
  • http://stackoverflow.com/questions/8023315/remove-factors-with-criteria was a perfect solution thanks! – jusjosgra Nov 08 '11 at 10:46