7

I've encountered an application where I need to sort a data.frame by column numbers, and none of the usual solutions seem to allow that.

The context is creating an as.data.frame.by method. Since a by object will have its last column as the value column and the first ncol-1 columns as the index columns. melt returns it sorted backwards--index 3, then index 2, then index 1. For compatibility with latex.table.by I'd like to sort it forwards. But I'm having trouble doing that in a sufficiently generic way. The commented-out line in the function below is my best attempt so far.

as.data.frame.by <- function( x, colnames=paste("IDX",seq(length(dim(x))),sep="" ), ... ) {
  num.by.vars <- length(dim(x))
    res <- melt(unclass(x))
  res <- na.omit(res)
    colnames(res)[seq(num.by.vars)] <- colnames
    #res <- res[ order(res[ , seq(num.by.vars)] ) , ] # Sort the results by the by vars in the heirarchy given
    res
}

dat <- transform( ChickWeight, Time=cut(Time,3), Chick=cut(as.numeric(Chick),3) )
my.by <- by( dat, with(dat,list(Time,Chick,Diet)), function(x) sum(x$weight) )
> as.data.frame(my.by)
            IDX1         IDX2 IDX3 value
1  (-0.021,6.99] (0.951,17.3]    1  3475
2      (6.99,14] (0.951,17.3]    1  5969
3        (14,21] (0.951,17.3]    1  8002
4  (-0.021,6.99]  (17.3,33.7]    1   640
5      (6.99,14]  (17.3,33.7]    1  1596
6        (14,21]  (17.3,33.7]    1  2900
13 (-0.021,6.99]  (17.3,33.7]    2  2253
14     (6.99,14]  (17.3,33.7]    2  4734
15       (14,21]  (17.3,33.7]    2  7727
22 (-0.021,6.99]  (17.3,33.7]    3   666
23     (6.99,14]  (17.3,33.7]    3  1391
24       (14,21]  (17.3,33.7]    3  2109
25 (-0.021,6.99]    (33.7,50]    3  1647
26     (6.99,14]    (33.7,50]    3  3853
27       (14,21]    (33.7,50]    3  7488
34 (-0.021,6.99]    (33.7,50]    4  2412
35     (6.99,14]    (33.7,50]    4  5448
36       (14,21]    (33.7,50]    4  8101

With the line uncommented, it returns gibberish (it just treats the whole data.frame as a vector, with disastrous results).

I've even tried clever stuff like res <- res[ order( ...=list(res[,1],res[,2]) ) , ] but to no avail.

I suspect there's a simple way to do this, but I'm not seeing it.

Edit for clarification: I want to not have to specify column names. Instead, I want to be able to sort it by a numerical vector (e.g. sort by columns 1:4).

Community
  • 1
  • 1
Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235

1 Answers1

7
mydf <- as.data.frame(my.by)
mydf[order(mydf$IDX3, mydf$IDX2, mydf$IDX1) , ]
            IDX1         IDX2 IDX3 value
1  (-0.021,6.99] (0.951,17.3]    1  3475
3        (14,21] (0.951,17.3]    1  8002
2      (6.99,14] (0.951,17.3]    1  5969
4  (-0.021,6.99]  (17.3,33.7]    1   640
6        (14,21]  (17.3,33.7]    1  2900
5      (6.99,14]  (17.3,33.7]    1  1596
13 (-0.021,6.99]  (17.3,33.7]    2  2253
15       (14,21]  (17.3,33.7]    2  7727
14     (6.99,14]  (17.3,33.7]    2  4734
22 (-0.021,6.99]  (17.3,33.7]    3   666
24       (14,21]  (17.3,33.7]    3  2109
23     (6.99,14]  (17.3,33.7]    3  1391
25 (-0.021,6.99]    (33.7,50]    3  1647
27       (14,21]    (33.7,50]    3  7488
26     (6.99,14]    (33.7,50]    3  3853
34 (-0.021,6.99]    (33.7,50]    4  2412
36       (14,21]    (33.7,50]    4  8101
35     (6.99,14]    (33.7,50]    4  5448

Or ;

my.by <- by( dat, with(dat,list(Diet,Chick, Time)), function(x) sum(x$weight) )
mydf <- as.data.frame(my.by)

EDIT: Or this produces same output as up top using numeric column indices:

 mydf <- as.data.frame(my.by)
 mydf[ do.call(order, mydf[, 3:1] ) , ]
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Sorry should have been clearer: I want to not have to specify column names. Instead, I want to be able to sort it by a numerical vector (e.g. sort by columns 1:4). – Ari B. Friedman Oct 11 '11 at 13:37
  • See above. The do.call method of passing dataframes to `order` is illustrated on `help(order)` page. – IRTFM Oct 11 '11 at 18:20
  • Nice. Thanks. I need to look more closely into `do.call`, as I suspect it would solve many of my problems :-) – Ari B. Friedman Oct 11 '11 at 18:45
  • 2
    Yes. It took me a couple of years to understand that `do.call` was the answer to many of my problems as well. `do.call` does for functions what `get` and `paste` do for data-objects, converting character representations into language objects, and allowing multiple values to construct an evaluated expression. – IRTFM Oct 11 '11 at 18:50
  • So can I think of `do.call` as being useful whenever I need to pass something to `...`? – Ari B. Friedman Oct 11 '11 at 18:56
  • I would need to see a bit more context. Are you talking about catching `...` arguments insde a function? In that case you generally need to do something like `lisargs <- list(...)` and then work with 'lisargs'. – IRTFM Oct 11 '11 at 19:54
  • No, the other way around--passing a list to a function which catches comma-separated arguments into a list in the manner you described (see the first line of `order`). – Ari B. Friedman Oct 11 '11 at 22:59
  • I think the answer is yes. `order` doesn't deal well on its own with arguments supplied as a list. You get the uninformative error message asking if "you called sort on a list." – IRTFM Oct 11 '11 at 23:14
  • Thanks. (He said, thereby ignoring the automated message to please move the discussion to chat) :-) – Ari B. Friedman Oct 11 '11 at 23:25