1

I have the following data frame

id,category,value
A,21,0.89
B,21,0.73
C,21,0.61
D,12,0.95
E,12,0.58
F,12,0.44
G,23,0.33

Note, they are already sorted by value within each (id,category). What I would like to be able to do is to get the top from each (id,category) and make a string, followed by the second in each (id,category) and so on. So for the above example it would look like

A,D,G,B,E,C,F

Is there a way to do it easily in R? Or am I better off relying on a Perl script to do it?

Thanks much in advance

broccoli
  • 4,738
  • 10
  • 42
  • 54
  • 2
    Currently your description of what you want and your example don't appear to match. – mnel Nov 15 '12 at 21:38
  • Your desired output is ordered by the order in which the values of category actually appear in your data frame, rather than the values themselves. Is that important? Or would you accept, `D,A,G,E,B,F,C`? – joran Nov 15 '12 at 21:42
  • See http://stackoverflow.com/questions/13279582/select-only-the-first-rows-for-each-unique-value-of-a-column-in-r – G. Grothendieck Nov 16 '12 at 12:55

1 Answers1

4

This appears to work, but I'm certain we could simplify it somewhat, particularly if you are able to relax your ordering requirements:

library(plyr)
d <- read.table(text = "id,category,value
 A,21,0.89
 B,21,0.73
 C,21,0.61
 D,12,0.95
 E,12,0.58
 F,12,0.44
 G,23,0.33",sep = ',',header = TRUE)
d <- ddply(d,.(category),transform,r = seq_along(category))
d <- arrange(d,id)
> paste(d$id[order(d$r)],collapse = ",")
[1] "A,D,G,B,E,C,F"

This version is probably more robust to ordering, and avoids plyr:

d$r <- unlist(sapply(rle(d$category)$lengths,seq_len))
d$s <- 1:nrow(d)
with(d,paste(id[order(r,s)],collapse = ","))
joran
  • 169,992
  • 32
  • 429
  • 468