Given a linear order completely represented by a list of tuples of strings, output the order as a list of strings

Question

Given pairs of items of form [(a,b),...] where (a,b) means a > b, for example:

[('best','better'),('best','good'),('better','good')]

I would like to output a list of form:

['best','better','good']

This is very hard for some reason. Any thoughts?

======================== code =============================

I know why it doesn't work.

def to_rank(raw):

  rank = []

  for u,v in raw:
    if u in rank and v in rank:
      pass

    elif u not in rank and v not in rank:
      rank = insert_front (u,v,rank)
      rank = insert_behind(v,u,rank)

    elif u in rank and v not in rank:
      rank = insert_behind(v,u,rank)

    elif u not in rank and v in rank:
      rank = insert_front(u,v,rank)

  return [[r] for r in rank]

# @Use: insert word u infront of word v in list of words
def insert_front(u,v,words):
  if words == []: return [u]
  else:
    head = words[0]
    tail = words[1:]
    if head == v: return [u] + words
    else        : return ([head] + insert_front(u,v,tail))

# @Use: insert word u behind word v in list of words
def insert_behind(u,v,words):
  words.reverse()
  words = insert_front(u,v,words)
  words.reverse()
  return words

=================== Update ===================

Per suggestion of many, this is a straight forward topological sort setting, I ultimately decided to use the code from this source: algocoding.wordpress.com/2015/04/05/topological-sorting-python/

which solved my problem.

def go_topsort(graph):
in_degree = { u : 0 for u in graph }     # determine in-degree 
for u in graph:                          # of each node
    for v in graph[u]:
        in_degree[v] += 1

Q = deque()                 # collect nodes with zero in-degree
for u in in_degree:
    if in_degree[u] == 0:
        Q.appendleft(u)

L = []     # list for order of nodes

while Q:                
    u = Q.pop()          # choose node of zero in-degree
    L.append(u)          # and 'remove' it from graph
    for v in graph[u]:
        in_degree[v] -= 1
        if in_degree[v] == 0:
            Q.appendleft(v)

if len(L) == len(graph):
    return L
else:                    # if there is a cycle,  
    return []

RockBilly's solution also work in my case, because in my setting, for every v < u, we are guaranteed to have a pair (u,v) in our list. So his answer is not very "computer-sciency", but it gets the job done in this case.

Can we presume that your input list is complete - that is, it contains all the comparisons needed to define the order? — Andrew Guy, Dec 12 '16 at 01:11
Seems like you need to first compute a transitive closure, perhaps using Warshall's algorithm, and then sort according to the resulting total order — John Coleman, Dec 12 '16 at 01:12
How large do you expect the list of tuples to be? How important is performance in the algorithm that you're looking for? — Simeon Visser, Dec 12 '16 at 01:16
@SimeonVisser exponential is fine, average length of list is 6. — xiaolingxiao, Dec 12 '16 at 01:17
The problem is relatively easy if you have `n-1` pairs (where `n` is the number of items) and it is also easy if it has all `n(n-1)/2` pairs. Is it true that sometimes you have an intermediate number of pairs? — John Coleman, Dec 12 '16 at 01:22
@JohnColeman heh, so in my setting if are trying to rank n items, we always have (n-1)! (that's a factorial) pairs. So it's a complete specification. — xiaolingxiao, Dec 12 '16 at 01:24
@chibro2 But the number of distinct pairs are `n(n-1)/2`, so I don't know where the factorial would come in. — John Coleman, Dec 12 '16 at 01:31
This looks like a straightforward case for a [topological sort](https://en.wikipedia.org/wiki/Topological_sorting) algorithm. — user2357112, Dec 12 '16 at 02:20

AChampion · Answer 1 · 2016-12-12T06:10:06.343

If you have a complete grammar specified then you can simply count up the items:

>>> import itertools as it
>>> from collections import Counter
>>> ranks = [('best','better'),('best','good'),('better','good')]
>>> c = Counter(x for x, y in ranks)
>>> sorted(set(it.chain(*ranks)), key=c.__getitem__, reverse=True)
['best', 'better', 'good']

If you have an incomplete grammar then you can build a graph and dfs all paths to find the longest. This isn't very inefficient, as I haven't thought about that yet :):

def dfs(graph, start, end):
    stack = [[start]]
    while stack:
        path = stack.pop()
        if path[-1] == end:
            yield path
            continue
        for next_state in graph.get(path[-1], []):
            if next_state in path:
                continue
            stack.append(path+[next_state])

def paths(ranks):
    graph = {}
    for n, m in ranks:
        graph.setdefault(n,[]).append(m)
    for start, end in it.product(set(it.chain(*ranks)), repeat=2):
        yield from dfs(graph, start, end)

>>> ranks = [('black', 'dark'), ('black', 'dim'), ('black', 'gloomy'), ('dark', 'gloomy'), ('dim', 'dark'), ('dim', 'gloomy')]
>>> max(paths(ranks), key=len)
['black', 'dim', 'dark', 'gloomy']
>>> ranks = [('a','c'), ('b','a'),('b','c'), ('d','a'), ('d','b'), ('d','c')]
>>> max(paths(ranks), key=len)
['d', 'b', 'a', 'c']

Counting is an elegantly simple way to approach the problem – John Coleman Dec 12 '16 at 02:57 — John Coleman, Dec 12 '16 at 02:57

score 1 · Accepted Answer · answered Dec 12 '16 at 01:44

1

What you're looking for is topological sort. You can do this in linear time using depth-first search (pseudocode included in the wiki I linked)

answered Dec 12 '16 at 01:44

llllvvuu

286
1
9

you have taken a simple thing to a new level!! – Wasi Ahmad Dec 12 '16 at 01:46
Can you explain how this is any less simple? I simply gave the name of the problem he asked about. – llllvvuu Dec 12 '16 at 01:49
OP doesn't need topological sort or DFS!! It can be achieved with more simpler code. – Wasi Ahmad Dec 12 '16 at 02:45
It's not a question of whether OP needs topological sort. Topological sort isn't the solution. It's the name of the problem he asked. A possible solution is DFS. He doesn't need it, sure, but it is a simple way. – llllvvuu Dec 12 '16 at 02:48
Thanks, classifying a problem is sometimes the most important step – xiaolingxiao Dec 12 '16 at 05:57
The specific case OP deals with does not require DFS, given that we already have a list of the nodes. So just iterating over the them will be the fastest and easiest solution as shown in my answer. But the speed would not much differentiate, given that they are both linear. Nonetheless, producing simplistic code also should be a goal. – Rockybilly Dec 12 '16 at 10:53

Andrew Guy · Answer 3 · 2016-12-12T03:45:54.090

You can take advantage of the fact that the lowest ranked item in the list will never appear at the start of any tuple. You can extract this lowest item, then remove all elements which contain this lowest item from your list, and repeat to get the next lowest.

This should work even if you have redundant elements, or have a sparser list than some of the examples here. I've broken it up into finding the lowest ranked item, and then the grunt work of using this to create a final ranking.

from copy import copy

def find_lowest_item(s):
    #Iterate over set of all items
    for item in set([item for sublist in s for item in sublist]):
        #If an item does not appear at the start of any tuple, return it
        if item not in [x[0] for x in s]:
            return item

def sort_by_comparison(s):
    final_list = []
    #Make a copy so we don't mutate original list
    new_s = copy(s)
    #Get the set of all items
    item_set = set([item for sublist in s for item in sublist])
    for i in range(len(item_set)):
        lowest = find_lowest_item(new_s)
        if lowest is not None:
            final_list.insert(0, lowest)
        #For the highest ranked item, we just compare our current 
        #ranked list with the full set of items
        else:
            final_list.insert(0,set(item_set).difference(set(final_list)).pop())
        #Update list of ranking tuples to remove processed items
        new_s = [x for x in new_s if lowest not in x]
    return final_list

list_to_compare = [('black', 'dark'), ('black', 'dim'), ('black', 'gloomy'), ('dark', 'gloomy'), ('dim', 'dark'), ('dim', 'gloomy')]
sort_by_comparison(list_to_compare)

['black', 'dim', 'dark', 'gloomy']

list2 = [('best','better'),('best','good'),('better','good')]
sort_by_comparison(list2)

['best', 'better', 'good']

list3 = [('best','better'),('better','good')]
sort_by_comparison(list3)

['best', 'better', 'good']

This is as of now the only other correct answer (it is an implementation of kahns algorithm, although it may not be a linear time implementation). Would be fine to accept this. — llllvvuu, Dec 12 '16 at 02:04

score 1 · Answer 4 · answered Dec 12 '16 at 01:49

Here is one way. It is based on using the complete pairwise rankings to make an old-style (early Python 2) cmp function and then using functools.cmp_to_key to convert it to a key suitable for the Python 3 approach to sorting:

import functools

def sortByRankings(rankings):
    def cmp(x,y):
        if x == y:
            return 0
        elif (x,y) in rankings:
            return -1
        else:
            return 1

    items = list({x for y in rankings for x in y})
    items.sort(key = functools.cmp_to_key(cmp))
    return items

Tested like:

ranks = [('a','c'), ('b','a'),('b','c'), ('d','a'), ('d','b'), ('d','c')]
print(sortByRankings(ranks)) #prints ['d', 'b', 'a', 'c']

Note that to work correctly, the parameter rankings must contain an entry for each pair of distinct items. If it doesn't, you would first need to compute the transitive closure of the pairs that you do have before you feed it to this function.

score -1 · Answer 5 · edited May 23 '17 at 12:08

-1

If you do sorting or create a dictionary from the list items, you are going to miss the order as @Rockybilly mentioned in his answer. I suggest you to create a list from the tuples of the original list and then remove duplicates.

def remove_duplicates(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

i = [(5,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
i = remove_duplicates(list(x for s in i for x in s))
print(i)  # prints [5, 2, 1, 3, 4]

j = [('excellent','good'),('excellent','great'),('great','good')]
j = remove_duplicates(list(x for s in j for x in s))
print(j)  # prints ['excellent', 'good', 'great']

See reference: How do you remove duplicates from a list in whilst preserving order?

For explanation on the remove_duplicates() function, see this stackoverflow post.

edited May 23 '17 at 12:08

Community

1
1

answered Dec 12 '16 at 01:45

Wasi Ahmad

35,739
32
114
161

I don't think this is what he asked for. The output for your last example should be 'excellent', 'better', 'good' – llllvvuu Dec 12 '16 at 01:58
@LawrenceWu where did you find `better`? input for my code is `[('excellent','good'),('excellent','great'),('great','good')]` and the answer is correct. don't judge a solution by watching output, see the code. – Wasi Ahmad Dec 12 '16 at 02:44
I meant "great". Yes, the output is still wrong. Read the question carefully - if the tuple (great, good) is in the input then great needs to come before good. – llllvvuu Dec 12 '16 at 02:45
OP didn't mention that, i have read the question carefully. moreover, good is coming before great because good (first appeared in tuple 1) appeared before great (first appeared in tuple 2) in tuples. i didn't change the order as it appeared in the list, so the output should be correct. – Wasi Ahmad Dec 12 '16 at 02:54

Rockybilly · Answer 6 · 2016-12-12T03:00:54.977

-3

If the list is complete, meaning has enough information to do the ranking(Also no duplicate or redundant inputs), this will work.

from collections import defaultdict
lst = [('best','better'),('best','good'),('better','good')]

d = defaultdict(int)

for tup in lst:
    d[tup[0]] += 1
    d[tup[1]] += 0 # To create it in defaultdict

print sorted(d, key = lambda x: d[x], reverse=True)
# ['best', 'better', 'good']

Just give them points, increment the left one each time you encounter it in the list.

Edit: I do think the OP has a determined type of input. Always have tuple count of combination nCr(n, 2). Which makes this a correct solution. No need to complain about the edge cases, which I already knew posting the answer(and mentioned it).

edited Dec 12 '16 at 03:00

answered Dec 12 '16 at 01:24

Rockybilly

2,938
1
13
38

1

`sorted(d)` will sort by keys, not the numeric values. – Simeon Visser Dec 12 '16 at 01:29
@SimeonVisser Ah, forgive my dullness. – Rockybilly Dec 12 '16 at 01:30
@Rockybilly it works on list of three items, but not list of four? for example: [('black', 'dark'), ('black', 'dim'), ('black', 'gloomy'), ('dark', 'gloomy'), ('dim', 'dark'), ('dim', 'gloomy')] outputs ['black','dark','dim','gloomy'], but should output ['black','dim','dark','gloomy'] – xiaolingxiao Dec 12 '16 at 01:30
@chibro2 The previous version was wrong, please try it again. – Rockybilly Dec 12 '16 at 01:33
@Rockybilly got it! Thanks, boy I never would've thought of that solution. It's too elegant. – xiaolingxiao Dec 12 '16 at 01:35
This is insufficient if you take out the (best, good) tuple. In fact, it gets worse if you then additionally put in a (better, bad) tuple. – llllvvuu Dec 12 '16 at 01:47
Yes, I don't think this works except for a few select cases, as @LawrenceWu pointed out. Incrementing based on representation in the first element of a tuple falls apart very quickly with duplicates, or sparser data (i.e. by removing `('best', 'good')`. – Andrew Guy Dec 12 '16 at 01:52
An example where this solution does not work: `lst = [('c', 'b'), ('c', 'a'), ('b', 'a')]`. It should return `['c', 'b', 'a']`. Instead it returns `['a', 'b', 'c']` – Seth Difley Dec 12 '16 at 02:50
@SethDifley It returns that because sort function presents values in ascending order. Which should be easy to notice. (Just add `reverse=True`). I don't think this solution should be discarded this easily, given that it is a correct solution. – Rockybilly Dec 12 '16 at 02:58
But if your input list is `[('best','better'),('better','good')]`, then it can't distinguish between `'best'` and `'better'`, despite this list being complete in the sense that it provides all the information you need to rank properly. – Andrew Guy Dec 12 '16 at 03:41
@AndrewGuy Yes, any reasonable answer trying to solve a similar question would handle that, But the OP does not have the list you present, he recieves a list of `nCr(n, 2)` elements, which makes my answer always solve the problem. In computer science terms, every computation you make besides this is overkill, so this is also an elegant solution. – Rockybilly Dec 12 '16 at 10:38

Given a linear order completely represented by a list of tuples of strings, output the order as a list of strings

6 Answers6

Linked