7

I have a data frame that relates bottle numbers to their volumes (key in the example below). I want to write a function that will take any list of bottle numbers (samp) and return a list of the bottle volumes while maintaining the bottle number order in samp.

The function below correctly matches the bottle numbers and volumes but sorts the output by ascending bottle number.

How can I maintain the order of samp with merge? Setting sort=FALSE results in an "unspecified order".

Example

samp <- c(9, 1, 4, 1)
num <- 1:10
vol <- sample(50:100, 10)
key <- data.frame(num, vol)
matchFun <- function(samp, key)
  {
    out <- merge(as.data.frame(samp), key, by.x="samp", by.y="num")
    return(out$vol)
  }
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
DQdlM
  • 9,814
  • 13
  • 37
  • 34
  • 3
    that doesn't seem to maintain the original order of `samp` for some reason though... – DQdlM Jun 21 '12 at 18:54
  • 12
    Well crap, I apologize. `sort=FALSE` returns the rows in an "unspecified order". Looks like I need to RTFM. ;-) Bring on the "great comment" up-votes. I like my crow well-done. – Joshua Ulrich Jun 21 '12 at 18:58
  • Thanks for the edits! That is a much clearer description of my problem. – DQdlM Jun 21 '12 at 19:17

2 Answers2

4

You can do this with match and subsetting key by the result:

bottles <- key[match(samp, key$num),]
# rownames are odd because they must be unique, clean them up
rownames(bottles) <- seq(NROW(bottles))
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • Joshua, shouldn't it be `match(key$num, samp)` instead? Because according to `?match` the length of the result is the length of the first argument. It is better to use the `%in%` syntax to prevent such a confusion: `key[key$num %in% samp,]`. Also note that you can reset the row names easily just by assigning NULL, no need for explicit sequence creation. – Tomas Aug 15 '13 at 20:39
  • @Tomas: The OP wanted a result the length of `samp`. `match(key$num, samp)` produces several `NA` and `key[key$num %in% samp,]` is missing one row and is in the wrong order. I agree with your `rownames<-` comment. – Joshua Ulrich Aug 15 '13 at 20:54
  • Joshua, I must admit I had not enough energy to understand the particular situation of the OP. What makes me confused is that `match(samp, key$num)` is TRUE/FALSE vector will of length of `samp` instead of length of `key`, which is a bit weird when you use it for subsetting the `key`, or at least unusual. If this works then only in this particular situation... – Tomas Aug 16 '13 at 09:55
3

join in the plyr package is great for this...

samp <- c(9, 1, 4, 1)
num <- 1:10
vol <- sample(50:100, 10)
key <- data.frame(num, vol)
samp<-as.data.frame(samp)
names(samp)<-"num"
library("plyr")
join(key,samp,type="right")
guyabel
  • 8,014
  • 6
  • 57
  • 86