-3

This is the original data.

str(demo$ID)
chr [1:5000] "Q05910452" "Q00509389" "Q59112261" "Q38120745" ...

str(ID.unique)
chr [1:4785, 1] "Q00027726" "Q00071545" "Q00073883" "Q00077269" ...

What I want to do is to make two data sets, one of which has 4785 IDs from demo$ID that are exactly same with ID.unique.

The other data set I want to make consists of the other IDs (215 IDs = 5000 - 4785) which are not included in ID.unique.

How can I do this? Please give a big help. Thank you very much.

Doo Hyun Shin
  • 297
  • 3
  • 15
  • 1
    You can use set operations for this. See `?intersect` and `?setdiff`. – jbaums Jan 04 '15 at 09:56
  • Next time please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Thank you. – David Arenburg Jan 04 '15 at 11:23

1 Answers1

2

You could try

indx <- demo$ID %in% ID.unique
lst <- split(demo, indx+1) #returns a list with two elements

data

ID.unique <- paste0('Q000', 1:5000)
set.seed(24)
demo <- data.frame(ID=sample(c(ID.unique, paste0('Q000', 5001:6000)),
              5000,replace=FALSE), Col2=rnorm(5000))
akrun
  • 874,273
  • 37
  • 540
  • 662