0

The general notation for sorting using the order function is this:

myData.sorted = myData[ order(-myData[,date.idx],-myData[,(1+date.idx)]), ];

I want to sort on a variable number of numeric cols (ncols) in the order they are passed into a function each with their own potential direction

sortDataFrameByNumericColumns = function (ddf, mycols, direction="DESC")
    {
    n.cols = length(mycols);
    n.dirs = length(direction);
    sdf = ddf;
    
    vecs = matrix(NA, nrow=dim(sdf)[1],ncol=n.cols);
    
    for(i in 1:n.cols)
        {
        idx = which( names(sdf)== mycols[i] );
        dir = if(n.dirs==1) { direction } else { direction[i]};
        
        if(dir == "ASC")
            {
            vecs[,i] = sdf[,idx];
            } else {  
                    # DESC
                    vecs[,i] = -sdf[,idx];
                    }       
        }   
        
        #########################################
        ## how I want it, doesn't work
        #fdf = sdf[order(vecs), ];

        #########################################
        ## non-variadic approach, does work
        fdf = sdf[order( vecs[,1],vecs[,2],vecs[,3] ), ];
        
        
    fdf;
    }

# basic usage
mycols = c("year","week","day");
fdf = sortDataFrameByNumericColumns (ddf,mycols,"ASC");  # sort all cols ASC

                          md5_email year week day V01
7  1768a550126bbf820dd89edecb92895c 2008   29 207 2.6
5  15712907fc659a6714e06659256aa0a2 2009   35 244 2.6
6  3ec0f0a866eeb8e0b419cccd6ea807b5 2010    9  60 4.2
8  8f2a765187594755f64c8d11bf34a3cc 2010   10  67 3.4
10 3b87bffacdd35679a992eadf816120a2 2010   31 216 3.4
2  db539502caf70a3074ac646d21198f5a 2011   16 111 3.4
4  4ee5096244e139d1d87eeaa0bef29d71 2011   21 143 1.0
9  3605e776744be0d11583305b0ede6419 2013   40 280 4.2
1  06da8174757feffd764c7232f965cd7a 2015    4  28 3.4
3  c29e24b16f1c8c6e897b42b45dee9297 2019    2  17 5.0


# basic usage
fdf = sortDataFrameByNumericColumns (ddf,mycols,"DESC");  # sort all cols DESC


                          md5_email year week day V01
3  c29e24b16f1c8c6e897b42b45dee9297 2019    2  17 5.0
1  06da8174757feffd764c7232f965cd7a 2015    4  28 3.4
9  3605e776744be0d11583305b0ede6419 2013   40 280 4.2
4  4ee5096244e139d1d87eeaa0bef29d71 2011   21 143 1.0
2  db539502caf70a3074ac646d21198f5a 2011   16 111 3.4
10 3b87bffacdd35679a992eadf816120a2 2010   31 216 3.4
8  8f2a765187594755f64c8d11bf34a3cc 2010   10  67 3.4
6  3ec0f0a866eeb8e0b419cccd6ea807b5 2010    9  60 4.2
5  15712907fc659a6714e06659256aa0a2 2009   35 244 2.6
7  1768a550126bbf820dd89edecb92895c 2008   29 207 2.6

# basic usage
mydirs = c("ASC","DESC","ASC");
fdf = sortDataFrameByNumericColumns (ddf,mycols,mydirs);  # custom direction on each column ...

                          md5_email year week day V01
7  1768a550126bbf820dd89edecb92895c 2008   29 207 2.6
5  15712907fc659a6714e06659256aa0a2 2009   35 244 2.6
10 3b87bffacdd35679a992eadf816120a2 2010   31 216 3.4
8  8f2a765187594755f64c8d11bf34a3cc 2010   10  67 3.4
6  3ec0f0a866eeb8e0b419cccd6ea807b5 2010    9  60 4.2
4  4ee5096244e139d1d87eeaa0bef29d71 2011   21 143 1.0
2  db539502caf70a3074ac646d21198f5a 2011   16 111 3.4
9  3605e776744be0d11583305b0ede6419 2013   40 280 4.2
1  06da8174757feffd764c7232f965cd7a 2015    4  28 3.4
3  c29e24b16f1c8c6e897b42b45dee9297 2019    2  17 5.0

I am using the order function as the engine. From my understanding on other posts, it is the fastest way to perform the operation. The manual states that the value I am passing in (currently a matrix vecs) needs to be a sequence of vectors. What does that mean?

?order

... 
a sequence of numeric, complex, character or logical vectors, all of the same length, or a classed R object.

It needs a sequence of equal-length vectors... I have a matrix vecs ... How do I cast them to sequence of vectors? That is the primary question.

So this works ... but is not variadic.

fdf = sdf[order(vecs[,1],vecs[,2],vecs[,3]), ];

If I could somehow cast vecs as vecs[,1],vecs[,2],vecs[,3] variadically, that would be the solution. I recognize do.call may be another approach, but I am specifically try to understand the ... notation of the base::order function.

Here is a sample test case of the data frame:

 x = sdf[sample(1:838,10),1:5];

 x
                           md5_email year week day V01
733 06da8174757feffd764c7232f965cd7a 2015    4  28 3.4
546 db539502caf70a3074ac646d21198f5a 2011   16 111 3.4
811 c29e24b16f1c8c6e897b42b45dee9297 2019    2  17 5.0
585 4ee5096244e139d1d87eeaa0bef29d71 2011   21 143 1.0
249 15712907fc659a6714e06659256aa0a2 2009   35 244 2.6
344 3ec0f0a866eeb8e0b419cccd6ea807b5 2010    9  60 4.2
96  1768a550126bbf820dd89edecb92895c 2008   29 207 2.6
346 8f2a765187594755f64c8d11bf34a3cc 2010   10  67 3.4
717 3605e776744be0d11583305b0ede6419 2013   40 280 4.2
410 3b87bffacdd35679a992eadf816120a2 2010   31 216 3.4


And in text format (run the command below, then Cntrl+C this text, then run the command below again):

"md5_email"|"year"|"week"|"day"|"V01"
"06da8174757feffd764c7232f965cd7a"|2015|4|28|3.4
"db539502caf70a3074ac646d21198f5a"|2011|16|111|3.4
"c29e24b16f1c8c6e897b42b45dee9297"|2019|2|17|5
"4ee5096244e139d1d87eeaa0bef29d71"|2011|21|143|1
"15712907fc659a6714e06659256aa0a2"|2009|35|244|2.6
"3ec0f0a866eeb8e0b419cccd6ea807b5"|2010|9|60|4.2
"1768a550126bbf820dd89edecb92895c"|2008|29|207|2.6
"8f2a765187594755f64c8d11bf34a3cc"|2010|10|67|3.4
"3605e776744be0d11583305b0ede6419"|2013|40|280|4.2
"3b87bffacdd35679a992eadf816120a2"|2010|31|216|3.4

where you can read from clipboard...

x = read.table(file = "clipboard", sep = "|", header=TRUE);
mshaffer
  • 959
  • 1
  • 9
  • 19
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. How do you need to call this function? – MrFlick Sep 08 '20 at 02:36

1 Answers1

1

I think what you are looking for can be achieved with do.call.

Subset the dataframe for columns that you want to sort on and apply order with do.call. Create a vector to multiply each column based on direction value passed. Use the order returned as a way to select rows which will sort the rows based on order of the columns.

sortDataFrameByNumericColumns <- function(ddf, mycols, direction="DESC") {
  newvec <- integer(length(mycols))
  newvec[direction == 'ASC'] <- 1
  newvec[direction == 'DESC'] <- -1
  ddf[do.call(order, sweep(ddf[cols], 2, newvec, `*`)), ]
}

Test the function on different inputs.

mycols = c("year","week","day")
fdf = sortDataFrameByNumericColumns (df,mycols,"ASC")
fdf
#                           md5_email year week day V01
#96  1768a550126bbf820dd89edecb92895c 2008   29 207 2.6
#249 15712907fc659a6714e06659256aa0a2 2009   35 244 2.6
#344 3ec0f0a866eeb8e0b419cccd6ea807b5 2010    9  60 4.2
#346 8f2a765187594755f64c8d11bf34a3cc 2010   10  67 3.4
#410 3b87bffacdd35679a992eadf816120a2 2010   31 216 3.4
#546 db539502caf70a3074ac646d21198f5a 2011   16 111 3.4
#585 4ee5096244e139d1d87eeaa0bef29d71 2011   21 143 1.0
#717 3605e776744be0d11583305b0ede6419 2013   40 280 4.2
#733 06da8174757feffd764c7232f965cd7a 2015    4  28 3.4
#811 c29e24b16f1c8c6e897b42b45dee9297 2019    2  17 5.0

fdf = sortDataFrameByNumericColumns (df,mycols,"DESC")
fdf

#                           md5_email year week day V01
#811 c29e24b16f1c8c6e897b42b45dee9297 2019    2  17 5.0
#733 06da8174757feffd764c7232f965cd7a 2015    4  28 3.4
#717 3605e776744be0d11583305b0ede6419 2013   40 280 4.2
#585 4ee5096244e139d1d87eeaa0bef29d71 2011   21 143 1.0
#546 db539502caf70a3074ac646d21198f5a 2011   16 111 3.4
#410 3b87bffacdd35679a992eadf816120a2 2010   31 216 3.4
#346 8f2a765187594755f64c8d11bf34a3cc 2010   10  67 3.4
#344 3ec0f0a866eeb8e0b419cccd6ea807b5 2010    9  60 4.2
#249 15712907fc659a6714e06659256aa0a2 2009   35 244 2.6
#96  1768a550126bbf820dd89edecb92895c 2008   29 207 2.6

mydirs = c("ASC","DESC","ASC")
fdf = sortDataFrameByNumericColumns (df,mycols,mydirs)
fdf
#                           md5_email year week day V01
#96  1768a550126bbf820dd89edecb92895c 2008   29 207 2.6
#249 15712907fc659a6714e06659256aa0a2 2009   35 244 2.6
#410 3b87bffacdd35679a992eadf816120a2 2010   31 216 3.4
#346 8f2a765187594755f64c8d11bf34a3cc 2010   10  67 3.4
#344 3ec0f0a866eeb8e0b419cccd6ea807b5 2010    9  60 4.2
#585 4ee5096244e139d1d87eeaa0bef29d71 2011   21 143 1.0
#546 db539502caf70a3074ac646d21198f5a 2011   16 111 3.4
#717 3605e776744be0d11583305b0ede6419 2013   40 280 4.2
#733 06da8174757feffd764c7232f965cd7a 2015    4  28 3.4
#811 c29e24b16f1c8c6e897b42b45dee9297 2019    2  17 5.0

data

df <- structure(list(md5_email = c("06da8174757feffd764c7232f965cd7a", 
"db539502caf70a3074ac646d21198f5a", "c29e24b16f1c8c6e897b42b45dee9297", 
"4ee5096244e139d1d87eeaa0bef29d71", "15712907fc659a6714e06659256aa0a2", 
"3ec0f0a866eeb8e0b419cccd6ea807b5", "1768a550126bbf820dd89edecb92895c", 
"8f2a765187594755f64c8d11bf34a3cc", "3605e776744be0d11583305b0ede6419", 
"3b87bffacdd35679a992eadf816120a2"), year = c(2015L, 2011L, 2019L, 
2011L, 2009L, 2010L, 2008L, 2010L, 2013L, 2010L), week = c(4L, 
16L, 2L, 21L, 35L, 9L, 29L, 10L, 40L, 31L), day = c(28L, 111L, 
17L, 143L, 244L, 60L, 207L, 67L, 280L, 216L), V01 = c(3.4, 3.4, 
5, 1, 2.6, 4.2, 2.6, 3.4, 4.2, 3.4)), class = "data.frame", row.names = c("733", 
"546", "811", "585", "249", "344", "96", "346", "717", "410"))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This is a nice answer, but I don't know how it ties into each column having an option variadic "direction"... This shouldn't be hard, if you do `?order` it says to pass in a sequence of vectors. I currently have a matrix where each column is a vector, how do I 'sequencify' a vector. Alternatively, `docall` seems viable if the 'direction' can be applied to each independently. Ideally, I would like to understand how to variadically enter a matrix vector as a sequential vector that the order function can handle. – mshaffer Sep 08 '20 at 03:04
  • Would each column have their own `direction` ? Or there will be only one value of `direction` that will be applied to all the columns? Can you update your post to include different types of input this function can take and show corresponding output that it should produce with the data that you have in your post? – Ronak Shah Sep 08 '20 at 03:16
  • it's there in the direction flag of the function, but I will clarify. – mshaffer Sep 08 '20 at 03:19
  • @mshaffer Check updated answer. – Ronak Shah Sep 08 '20 at 03:47
  • I recognize `do.call` may be another approach, but I am specifically try to understand the ... notation of the `base::order` function. How can I cast a matrix (in its columns) as a sequence of vectors that will play nice in `base::order` function? – mshaffer Sep 08 '20 at 16:40
  • Sorry, I don't exactly understand your question. I don't know which matrix you are talking about. I can't find the use of matrix or it's columns here. – Ronak Shah Sep 08 '20 at 22:45
  • That's fine, I found the solution in another post. Your approach does work. – mshaffer Sep 09 '20 at 01:40