95

I'm trying to understand how the order() function works. I was under the impression that it returned a permutation of indices, which when sorted, would sort the original vector.

For instance,

> a <- c(45,50,10,96)
> order(a)
[1] 3 1 2 4

I would have expected this to return c(2, 3, 1, 4), since the list sorted would be 10 45 50 96.

Can someone help me understand the return value of this function?

Petter Friberg
  • 21,252
  • 9
  • 60
  • 109
jeffshantz
  • 983
  • 1
  • 8
  • 6

7 Answers7

105

This seems to explain it.

The definition of order is that a[order(a)] is in increasing order. This works with your example, where the correct order is the fourth, second, first, then third element.

You may have been looking for rank, which returns the rank of the elements
R> a <- c(4.1, 3.2, 6.1, 3.1)
R> order(a)
[1] 4 2 1 3
R> rank(a)
[1] 3 2 4 1
so rank tells you what order the numbers are in, order tells you how to get them in ascending order.

plot(a, rank(a)/length(a)) will give a graph of the CDF. To see why order is useful, though, try plot(a, rank(a)/length(a),type="S") which gives a mess, because the data are not in increasing order

If you did
oo<-order(a)
plot(a[oo],rank(a[oo])/length(a),type="S")
or simply
oo<-order(a)
plot(a[oo],(1:length(a))/length(a)),type="S")
you get a line graph of the CDF.

I'll bet you're thinking of rank.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • 8
    Ahh.. I see now. order() returns the indices of the vector in sorted order. Wonderful, thanks very much. – jeffshantz Feb 23 '10 at 02:27
  • `order(a, decreasing = T)` and `rank(a)` will return an equivalent answer. – omar May 24 '16 at 11:10
  • I am having problem with order. `a<-c(4,2,1,80,13)` Then `order(a)` should be `3 4 5 1 2`, but strangely I am getting `3 2 1 5 4` – Shoham Debnath May 27 '16 at 11:42
  • 1
    @duffymo a little help here would be really appreciated. When is `rank` and `order` same? – Shoham Debnath May 27 '16 at 11:45
  • Actually, `order(order(a))` will return the same as `rank(a)` *if* there are no ties. If there are then it will return the same as `rank(a, ties.method="first")`. – jac Aug 09 '16 at 21:38
  • Suppose `s = sort(a)`, could we then say: `s == a[order(a)] ` and, provided there are no ties, `a == s[rank(a)]`? (not sure about the R syntax) – djvg Jun 16 '21 at 12:51
34

To sort a 1D vector or a single column of data, just call the sort function and pass in your sequence.

On the other hand, the order function is necessary to sort data two-dimensional data--i.e., multiple columns of data collected in a matrix or dataframe.

Stadium Home Week Qtr Away Off Def Result       Kicker Dist
751     Out  PHI   14   4  NYG PHI NYG   Good      D.Akers   50
491     Out   KC    9   1  OAK OAK  KC   Good S.Janikowski   32
702     Out  OAK   15   4  CLE CLE OAK   Good     P.Dawson   37
571     Out   NE    1   2  OAK OAK  NE Missed S.Janikowski   43
654     Out  NYG   11   2  PHI NYG PHI   Good      J.Feely   26
307     Out  DEN   14   2  BAL DEN BAL   Good       J.Elam   48
492     Out   KC   13   3  DEN  KC DEN   Good      L.Tynes   34
691     Out  NYJ   17   3  BUF NYJ BUF   Good     M.Nugent   25
164     Out  CHI   13   2   GB CHI  GB   Good      R.Gould   25
80      Out  BAL    1   2  IND IND BAL   Good M.Vanderjagt   20

Here is an excerpt of data for field goal attempts in the 2008 NFL season, a dataframe i've called 'fg'. suppose that these 10 data points represent all of the field goals attempted in 2008; further suppose you want to know the the distance of the longest field goal attempted that year, who kicked it, and whether it was good or not; you also want to know the second-longest, as well as the third-longest, etc.; and finally you want the shortest field goal attempt.

Well, you could just do this:

sort(fg$Dist, decreasing=T)

which returns: 50 48 43 37 34 32 26 25 25 20

That is correct, but not very useful--it does tell us the distance of the longest field goal attempt, the second-longest,...as well as the shortest; however, but that's all we know--eg, we don't know who the kicker was, whether the attempt was successful, etc. Of course, we need the entire dataframe sorted on the "Dist" column (put another way, we want to sort all of the data rows on the single attribute Dist. that would look like this:

Stadium Home Week Qtr Away Off Def Result       Kicker Dist
751     Out  PHI   14   4  NYG PHI NYG   Good      D.Akers   50
307     Out  DEN   14   2  BAL DEN BAL   Good       J.Elam   48
571     Out   NE    1   2  OAK OAK  NE Missed S.Janikowski   43
702     Out  OAK   15   4  CLE CLE OAK   Good     P.Dawson   37
492     Out   KC   13   3  DEN  KC DEN   Good      L.Tynes   34
491     Out   KC    9   1  OAK OAK  KC   Good S.Janikowski   32
654     Out  NYG   11   2  PHI NYG PHI   Good      J.Feely   26
691     Out  NYJ   17   3  BUF NYJ BUF   Good     M.Nugent   25
164     Out  CHI   13   2   GB CHI  GB   Good      R.Gould   25
80      Out  BAL    1   2  IND IND BAL   Good M.Vanderjagt   20

This is what order does. It is 'sort' for two-dimensional data; put another way, it returns a 1D integer index comprised of the row numbers such that sorting the rows according to that vector, would give you a correct row-oriented sort on the column, Dist

Here's how it works. Above, sort was used to sort the Dist column; to sort the entire dataframe on the Dist column, we use 'order' exactly the same way as 'sort' is used above:

ndx = order(fg$Dist, decreasing=T)

(i usually bind the array returned from 'order' to the variable 'ndx', which stands for 'index', because i am going to use it as an index array to sort.)

that was step 1, here's step 2:

'ndx', what is returned by 'sort' is then used as an index array to re-order the dataframe, 'fg':

fg_sorted = fg[ndx,]

fg_sorted is the re-ordered dataframe immediately above.

In sum, 'sort' is used to create an index array (which specifies the sort order of the column you want sorted), which then is used as an index array to re-order the dataframe (or matrix).

doug
  • 69,080
  • 24
  • 165
  • 199
  • 2
    -1: order makes pretty good sense for a vector. The basic property of order--that a[order(a)] is sorted--is not clearly stated. – Jyotirmoy Bhattacharya Feb 23 '10 at 03:32
  • 3
    Wrong. you need to look again--the 'basic property' is indeed shown very clearly in the two (grey-background) lines of code above. Because sorting w/ 'order' is two separate operations, i showed this using two lines of code--one creating the index vector and the second line using that index to perform the sort. The OP asked for an explanation not just a result, and i gave him one, as evidenced by the fact that he selected my answer and wrote the brief note above "Thanks [m]akes perfect sense". I even bound the final result to a variable called "fg_sorted". – doug Apr 03 '10 at 16:19
26

(I thought it might be helpful to lay out the ideas very simply here to summarize the good material posted by @doug, & linked by @duffymo; +1 to each,btw.)

?order tells you which element of the original vector needs to be put first, second, etc., so as to sort the original vector, whereas ?rank tell you which element has the lowest, second lowest, etc., value. For example:

> a <- c(45, 50, 10, 96)
> order(a)  
[1] 3 1 2 4  
> rank(a)  
[1] 2 3 1 4  

So order(a) is saying, 'put the third element first when you sort... ', whereas rank(a) is saying, 'the first element is the second lowest... '. (Note that they both agree on which element is lowest, etc.; they just present the information differently.) Thus we see that we can use order() to sort, but we can't use rank() that way:

> a[order(a)]  
[1] 10 45 50 96  
> sort(a)  
[1] 10 45 50 96  
> a[rank(a)]  
[1] 50 10 45 96  

In general, order() will not equal rank() unless the vector has been sorted already:

> b <- sort(a)  
> order(b)==rank(b)  
[1] TRUE TRUE TRUE TRUE  

Also, since order() is (essentially) operating over ranks of the data, you could compose them without affecting the information, but the other way around produces gibberish:

> order(rank(a))==order(a)  
[1] TRUE TRUE TRUE TRUE  
> rank(order(a))==rank(a)  
[1] FALSE FALSE FALSE  TRUE  
gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
  • 1
    `order` and `rank` are actually inverses of each other (at least as long as the values in `a` are unique). If you imagine each had names(/labels) ('1','2','3','4') on their values, then the values of `order(a)` tells you what position in `rank(a)` each label occurs in (e.g. the 1st value of `order(a)` (3) tells you that '1' occurs in the 3rd position of `rank(a)`, and vice versa (e.g. the 2nd value of `rank(a)` (3) tells you that '2' occurs in the 3rd position of `order(a)`). They're inverse permutations: `rank(order(a))` = `order(rank(a))` = `1 2 3 4` – Glen_b Sep 27 '13 at 06:43
  • "?order tells you which element of the original vector needs to be put first, second, etc., so as to sort the original vector, whereas ?rank tell you which element has the lowest, second lowest, etc., value." There. That's all anyone had to say. Finally. Thank you!! – Aleksandr Hovhannisyan Dec 29 '17 at 21:03
  • succinctly explained .. what one needs "?order tells you which element of the original vector needs to be put first, second, etc., so as to sort the original vector, whereas ?rank tell you which element has the lowest, second lowest, etc., value. " – KaLi Mar 06 '18 at 06:25
9

Running this little piece of code allowed me to understand the order function

x <- c(3, 22, 5, 1, 77)

cbind(
  index=1:length(x),
  rank=rank(x),
  x, 
  order=order(x), 
  sort=sort(x)
)

     index rank  x order sort
[1,]     1    2  3     4    1
[2,]     2    4 22     1    3
[3,]     3    3  5     3    5
[4,]     4    1  1     2   22
[5,]     5    5 77     5   77

Reference: http://r.789695.n4.nabble.com/I-don-t-understand-the-order-function-td4664384.html

kazuwal
  • 1,071
  • 17
  • 25
2

This could help you at some point.

a <- c(45,50,10,96)
a[order(a)]

What you get is

[1] 10 45 50 96

The code I wrote indicates you want "a" as a whole subset of "a" and you want it ordered from the lowest to highest value.

Alejandro Carrera
  • 513
  • 1
  • 4
  • 14
2

In simple words, order() gives the locations of elements of increasing magnitude.

For example, order(c(10,20,30)) will give 1,2,3 and order(c(30,20,10)) will give 3,2,1.

Horai Nuri
  • 5,358
  • 16
  • 75
  • 127
Arnab Jana
  • 31
  • 2
0

they are similar but not same

set.seed(0)
x<-matrix(rnorm(10),1)

# one can compute from the other
rank(x)  == col(x)%*%diag(length(x))[order(x),]
order(x) == col(x)%*%diag(length(x))[rank(x),]
# rank can be used to sort
sort(x) == x%*%diag(length(x))[rank(x),]
Nick Nassuphis
  • 257
  • 2
  • 6
  • rank is the inverse permutation of order: ``all(x==x[order(x)][rank(x)])`` is always true. some permutations are their own inverse, but most are not. the inverse to sort permutation coming out of order() is rank(). this explains why they are sometimes the same and otherimes not. – Nick Nassuphis Jan 17 '20 at 19:07