Ranking variables, ordering

Question

How to order a set of variables with given conditions.

If I am given:

A < B, B < C, and C < A. This is impossible.
A < B, C < B, and A < C. The order, from least to greatest, is A, C, B.

I prefer R, if I can just give the method strings such as c("A<B", "C<B", "A<C") and from there, the method returns c("A","C", "B")

Given any number of variables, how can I rank them from least to greatest, based on these conditions? Please do incorporate a break in the code if the scenario is impossible. Thank you.

Oh if you are going to down vote, please oh please, tell me why, so that I can edit my question so that it fits SO standards. Just down voting isn't going to help me, anyone else, or this website in general. — user41912, Aug 04 '14 at 14:29
@beginneR I just need basic programming logic, or pseudocode. Either is fine. If you want an answer, I prefer R. — user41912, Aug 04 '14 at 14:35
Take a look at the answer to this question. http://stackoverflow.com/questions/7515377/sort-a-list-of-nontrivial-elements-in-r — Matthew Plourde, Aug 04 '14 at 15:10
@MatthewPlourde I see how it relates. I am just unsure of how I would approach this still. If you can provide a little bit of code to get me started... — user41912, Aug 04 '14 at 15:18
You could use `igraph` for this. Split your string into vertices - you can then check for cycles and if none return the vertice names in topological order. — user20650, Aug 04 '14 at 15:24
@MatthewPlourde Why did you remove your answer? I actually didn't get to read it... — user41912, Aug 04 '14 at 15:51

user20650 · Answer 1 · 2014-08-26T14:34:30.320

This works for your example (- but not checked for edge case), It uses the idea that dag's (directed acyclic graphs) have a natural ordering. This is the case for your example. An impossible scenario in statements would result in a cycle in the graph hence we can use that to throw an error.

This works if all your statements are strict

library(igraph)

f <- function(X) {  
          d <- do.call("rbind", strsplit(X , "<"))
          g <- graph.data.frame(d)

          if(is.dag(g))
               V(g)$name[topological.sort(g)]
          else
               stop("Graph has cycles")
     }

f(c("A<B", "C<B", "A<C"))
f(c("C<B", "A<C", "A<B"))
f(c("A<B", "B<C", "C<A"))

About the code:

  # this splits the string into two columns 
  # interpreted as from -> to by graph
  do.call("rbind", strsplit(X , "<"))

  # generate graph from edges (d)
  g <- graph.data.frame(d)

  is.dag(g) # checks if the graph is acyclic

  # if there are no cycles return the vertice names in topological order
   V(g)$name[topological.sort(g)]

EDIT

To include the case where vertices are equivalent restricts the use of the above. But we can still use graphical method (although it is less natural and not the R way to do it) - by describing equal vertices with bidirectional edges. If model / statements are fully specified (which it should be),we have a complete graph, and we can use the fact that if two vertices are equal they should have the same set of vertices greater than them and the same less than them - otherwise throw an error.

So we use the ideas of parents and children and compare these between equivalent vertices (A < B <=> parent < child).

The first part of the function is for the cases where the statements are all strictly less or greater than. The second part compares the parent and children of nodes that have equivalence statement.

f <- function(X) {
        l <- do.call("rbind", strsplit(X[grepl("<", X)] , "<"))

       if(!any(grepl("==", X))) {
                g <- graph.data.frame(l)

               if(is.dag(g))
                     V(g)$name[topological.sort(g)]
               else
                     stop("Impossible")
             }

       else {
           e <- do.call("rbind", strsplit(X[grepl("==", X)] , "=="))
           g <- graph.data.frame(rbind(l, e, e[,2:1]))  

           par <- function(g) 
                    setNames(lapply(neighborhood(g, 1, mode="in"), 
                         function(i) sort(V(g)$name[i])), V(g)$name)
           ch <- function(g) 
                    setNames(lapply(neighborhood(g, 1, mode="out"), 
                         function(i) sort(V(g)$name[i])), V(g)$name)

           pareq <- apply(e, 1, 
                        function(i) 
                            identical(par(g)[[i[1]]], par(g)[[i[2]]]))
           cheq <- apply(e, 1, 
                        function(i) 
                            identical(ch(g)[[i[1]]], ch(g)[[i[2]]]))

      if(all(pareq & cheq)) {
           g <- graph.data.frame(rbind(l,e))
           V(g)$name[topological.sort(g)]
          }
      else 
           stop("Impossible")
   }
}

A couple of example

f(X = c("C<B", "A<C", "A<B"))
f(X = c("C==B", "C==A", "A<B"))
f(X = c("C==B", "C<A", "A<B"))
f(X = c("B==C", "C<A", "B<A"))

I have not checked this for all edge cases or larger graphs but it should give an idea to get you started (if you want to go this way)

Wow what a solution! I don't understand all the methods yet (I shall read some documentation!), but for such a unique package to exist, and for you to link my question to this function, is quite remarkable. Great answer. I feel like my question is quite elementary though, but if I think about it, it may be more complex than meets the eye. — user41912, Aug 04 '14 at 15:45
Oh wait, just one issue. What if I have values that are equal, will it still work? @user20650 — user41912, Aug 04 '14 at 15:47
The `strplit` stage would need to be tweaked to account for a different sysmbol, other than `<`. But if they are equal what do you want to happen? — user20650, Aug 04 '14 at 15:50
Actually I will just save you the trouble. Random ordering it is. I don't care if they equal, you can put either first. @user20650 — user41912, Aug 04 '14 at 15:56
Great, sorry I took away "accepted answer". I just want to make sure that if you are unable to answer the question, newcomers to the post still want to make a contribution. — user41912, Aug 04 '14 at 16:00
@user41912; no problem at all. I think its a good idea to leave the check mark off for a while. BTW if you like the `igraph` approach you could add it as a `tag` as there are several `graph` experts that likely could give a better answer — user20650, Aug 04 '14 at 16:22
Hmm, the equality causes problems with this method (even after tweaking the `do.call` stage). For example `c("C — user20650, Aug 04 '14 at 16:35

score 1 · Answer 2 · edited May 23 '17 at 11:49

1

Here's another method that defines custom order operators for your elements and then calls the built in method sort.

rules <- c("A<B", "C<B", "A<C")
vec <- c("A", "B", "C", "A")

class(vec) <- 'letters'
`[.letters` <- function(x, i) {
    x <- unclass(x)
    e <- x[i]
    class(e) <- 'letters'
    e
}
`==.letters` <- function(x, y) unclass(x) == unclass(y)
`>.letters` <- function(x, y) paste(y, x, sep='<') %in% rules

sort(vec)
# [1] "A" "A" "C" "B"

This strategy was suggested by this answer, but the method I use here is simpler.

edited May 23 '17 at 11:49

Community

1
1

answered Aug 04 '14 at 15:47

Matthew Plourde

43,932
7
96
113

It seems to work, however it isn't detecting when it is impossible. – user41912 Aug 04 '14 at 16:56

Ranking variables, ordering

2 Answers2