Sorting based on one of the list among Nested list in python

Question

I have a list as [[4,5,6],[2,3,1]]. Now I want to sort the list based on list[1] i.e. output should be [[6,4,5],[1,2,3]]. So basically I am sorting 2,3,1 and maintaining the order of list[0].

While searching I got a function which sorts based on first element of every list but not for this. Also I do not want to recreate list as [[4,2],[5,3],[6,1]] and then use the function.

Welcome to Stack Overflow! We encourage you to [research your questions](http://stackoverflow.com/questions/how-to-ask). If you've [tried something already](http://whathaveyoutried.com/), please add it to the question - if not, research and attempt your question first, and then come back. — , Sep 11 '12 at 11:04
`basically I am sorting 2,3,1 and maintaining the order of list[0]` - errr, [6,4,5] != [4,5,6] ? Are you just after `your_list[1].sort()` ? — Jon Clements, Sep 11 '12 at 11:05
I don't think you can get away without overhead of creating something - whether via zip or numpy. — gorlum0, Sep 11 '12 at 11:54

score 7 · Answer 1 · edited Apr 03 '23 at 18:00

7

Since [4, 5, 6] and [2, 3, 1] serves two different purposes I will make a function taking two arguments: the list to be reordered, and the list whose sorting will decide the order. I'll only return the reordered list.

This answer has timings of three different solutions for creating a permutation list for a sort. Using the fastest option gives this solution:

def pyargsort(seq):
    return sorted(range(len(seq)), key=seq.__getitem__)

def using_pyargsort(a, b):
    "Reorder the list a the same way as list b would be reordered by a normal sort"
    return [a[i] for i in pyargsort(b)]                     

print using_pyargsort([4, 5, 6], [2, 3, 1])    # [6, 4, 5]

The pyargsort method is inspired by the numpy argsort method, which does the same thing much faster. Numpy also has advanced indexing operations whereby an array can be used as an index, making possible very quick reordering of an array.

So if your need for speed is great, one would assume that this numpy solution would be faster:

import numpy as np

def using_numpy(a, b):
    "Reorder the list a the same way as list b would be reordered by a normal sort"
    return np.array(a)[np.argsort(b)].tolist()

print using_numpy([4, 5, 6], [2, 3, 1])     # [6, 4, 5]

However, for short lists (length < 1000), this solution is in fact slower than the first. This is because we're first converting the a and b lists to array and then converting the result back to list before returning. If we instead assume you're using numpy arrays throughout your application so that we do not need to convert back and forth, we get this solution:

def all_numpy(a, b):
    "Reorder array a the same way as array b would be reordered by a normal sort"
    return a[np.argsort(b)]

print all_numpy(np.array([4, 5, 6]), np.array([2, 3, 1]))    # array([6, 4, 5])

The all_numpy function executes up to 10 times faster than the using_pyargsort function.

The following logaritmic graph compares these three solutions with the two alternative solutions from the other answers. The arguments are two randomly shuffled ranges of equal length, and the functions all receive identically ordered lists. I'm timing only the time the function takes to execute. For illustrative purposes I've added in an extra graph line for each numpy solution where the 60 ms overhead for loading numpy is added to the time.

enter image description here

As we can see, the all-numpy solution beats the others by an order of magnitude. Converting from python list and back slows the using_numpy solution down considerably in comparison, but it still beats pure python for large lists.

For a list length of about 1'000'000, using_pyargsort takes 2.0 seconds, using_nympy + overhead is only 1.3 seconds, while all_numpy + overhead is 0.3 seconds.

edited Apr 03 '23 at 18:00

Glorfindel

21,988
13
81
109

answered Sep 11 '12 at 11:08

Lauritz V. Thaulow

49,139
12
73
92

3

No need to install a third party library for a simple task like sorting a list... numpy is not part of a standard python installation. – l4mpi Sep 11 '12 at 11:10
2

@l4mpi Read the question again. It's not a standard sort, but reordering the first item of a list the same way that a standard sort of the second item would. Furthermore, a downvote *means* "this answer is not useful". If someone already uses numpy, this answer *is* useful, and even if not, it presents an option. Your downvote is inappropriate. – Lauritz V. Thaulow Sep 11 '12 at 11:13
1

Well it gets +1 from me as a perfectly viable method. It's also easier to read than sorted/zip/unpacking code. Also, most linux distro's have numpy as standard, and if the OP is venturing into "out of the ordinary" stuff like this - it's an option well worth knowing... – Jon Clements Sep 11 '12 at 11:15
1

I understood what he wants to do and it is still not too hard to do in pure python. Your answer gives off the impression (to someone unfamilliar with the subject) that numpy is part of the standard library and is, as you stated, only useful for someone using numpy or wanting to use it (which would be a bad idea if all he does with it is this sort). – l4mpi Sep 11 '12 at 11:19
1

Using 3rd party stuff should be avoided unless necessary (or *really* massively shortening things). Using numpy's argsort() means overhead. No downvote from me anyway. I just don't like this answer. – Alfe Sep 11 '12 at 11:20
2

@l4mpi I'll edit the answer to make clear that numpy is not part of the standard library. – Lauritz V. Thaulow Sep 11 '12 at 11:21
I still don't think that `list(numpy.array(a[0])[numpy.argsort(a[1])])` (that's what it boils down to, besides import and init) is better than the `zip` answer. – Alfe Sep 11 '12 at 11:33
1

I like your plot, but *sigh*, it has no axis labels ;-). – mgilson Sep 11 '12 at 13:18
Now you got my upvote as well. I think the runtime difference between pyargsort and the numpy version is too small to justify using numpy. It all boils down to whether numpy.sort() is faster than list.sort() or not. And this is just an implementation detail of the current versions (and thus may change any time). There is no reason in principle why the python sort should be slower than the numpy sort, am I right? – Alfe Sep 11 '12 at 14:26
@Alfe For the sorting step, you are right. But this also has a indexing step, which is bound to be slower in python, and the numpy solution has unneeded `list` to `array` to `list` convertions slowing it down. But I'm nitpicking, since I agree that nobody should install numpy if this is the only way they'll use it, given the nice python solution. – Lauritz V. Thaulow Sep 11 '12 at 17:13
@Alfe I've updated the graph with a solution that does not convert back and forth from `list`. The results speak for themselves. – Lauritz V. Thaulow Sep 17 '12 at 09:37
That's very complete now. But let's not forget that runtime is just one aspect of the task at hand. Another (and in most cases the more important) aspect is code clearance. To help the next maintainer of your code understand what it does (or shall do, in case of a bug), the most descriptive code is often better than the fastest, most efficient and optimized version. That's why I still prefer a clear Python-only version unless numpy already is part of the project. – Alfe Sep 17 '12 at 14:29

mgilson · Answer 2 · 2012-09-11T12:10:30.770

5

The sorting you describe is not very easy to accomplish. The only way that I can think of to do it is to use zip to create the list you say you don't want to create:

lst = [[4,5,6],[2,3,1]]
# key = operator.itemgetter(1) works too, and may be slightly faster ...
transpose_sort = sorted(zip(*lst),key = lambda x: x[1])
lst = zip(*transpose_sort)

Is there a reason for this constraint?

(Also note that you could do this all in one line if you really want to:

lst = zip(*sorted(zip(*lst),key = lambda x: x[1]))

This also results in a list of tuples. If you really want a list of lists, you can map the result:

lst = map(list, lst)

Or a list comprehension would work as well:

lst = [ list(x) for x in lst ]

edited Sep 11 '12 at 12:10

answered Sep 11 '12 at 11:09

mgilson

300,191
65
633
696

avoid map: `[list(t) for t in zip(*sorted(zip(*[[4,5,6],[2,3,1]]),key = lambda x: x[1]))]` But I guess that `list(zip(*sorted(zip(*[[4,5,6],[2,3,1]]),key = lambda x: x[1]))[0])` is what he really wants. – Alfe Sep 11 '12 at 11:17
@Alfe -- I don't see any reason to avoid `map` in favor of a list-comp here (other than it works identically for py2k and py3k I suppose). – mgilson Sep 11 '12 at 11:20
Sorry for my brevity; I meant "to avoid map" (just in case you want to, which I usually do). – Alfe Sep 11 '12 at 11:24

l4mpi · Answer 3 · 2012-09-11T12:18:12.857

0

If the second list doesn't contain duplicates, you could just do this:

l = [[4,5,6],[2,3,1]]   #the list
l1 = l[1][:]            #a copy of the to-be-sorted sublist
l[1].sort()             #sort the sublist
l[0] = [l[0][l1.index(x)] for x in l[1]] #order the first sublist accordingly

(As this saves the sublist l[1] it might be a bad idea if your input list is huge)

edited Sep 11 '12 at 12:18

answered Sep 11 '12 at 11:26

l4mpi

5,103
3
34
54

fails if numbers in l[1] appear more than once. – Alfe Sep 11 '12 at 11:35
I stated this in the Answer, no need to downvote... OP didn't specify if his list contains duplicates, and if it did, the order of the first list wouldn't be deterministic anyways. E.g. `[[1,2,3],[5,4,5]]` could be `[[2,1,3],[4,5,5]]` or `[[2,3,1],[4,5,5]]` ... – l4mpi Sep 11 '12 at 11:38
Nope. If values appear more than once in l[1], some values in l[0] will be repeated, other will not appear at all. Your result then is not a sorted version of the input (e.g. you return `[6, 4, 4]` for `l = [[4,5,6],[2,2,1]]`. Since the asker did not specify that values cannot appear more than once in l[1], you must not assume this (or at least specify it loudly if you do). – Alfe Sep 11 '12 at 11:59
You misunderstood my comment... It is not even specified how the correct oder of the elements should be if there are duplicates, this would be down to sorting conventions (keeping original order vs. least moves etc). And I did write it in my answer, I can edit that to be in all caps if you think it isn't loud enough :) – l4mpi Sep 11 '12 at 12:05
»It is not even specified how the correct oder of the elements should be if there are duplicates, this would be down to sorting conventions« — that's not my point. My point is that there are elements missing and others duplicated in your output. You cannot blame this on being underspecified by the sorting spec. If you want to convince me that what the asker asked could be automatically amended by "btw, the numbers in the second list all appear just once, guaranteed", you've got a long way to go. – Alfe Sep 11 '12 at 12:10
Putting your restrictive spec in all caps *might* make me revoke my downvote ;-) But I will still think that this is not a good answer to the question asked. – Alfe Sep 11 '12 at 12:12
changed my wording... better? And I still think my answer is better for small or medium sized lists than installing a third party library with C extensions... – l4mpi Sep 11 '12 at 12:21
±0 for your tireless effort in mending what's broken by design ;-) I still think that first sorting a list and then *searching* for the index of all elements of the original list in the sorted version is not a clean approach (obviously faulty when there are duplicates in the list, but also when there are not, this does not seem like a good idea). Sequentially searching for each element will have O(n log n) again, as does the sorting. You're lucky that O(2 n log n) is O(n log n) ;-) – Alfe Sep 11 '12 at 14:03

Alfe · Answer 4 · 2012-09-11T14:10:25.113

How about this one:

a = [[4,5,6],[2,3,1]]
[a[0][i] for i in sorted(range(len(a[1])), key=lambda x: a[1][x])]

It uses the principal way numpy does it without having to use numpy and without the zip stuff.

Neither using numpy nor the zipping around seems to be the cheapest way for giant structures. Unfortunately the .sort() method is built into the list type and uses hard-wired access to the elements in the list (overriding __getitem__() or similar does not have any effect here).

So you can implement your own sort() which sorts two or more lists according to the values in one; this is basically what numpy does.

Or you can create a list of values to sort, sort that, and recreate the sorted original list out of it.

Ah, I see now that the numpy version above is now also amended by this solution (called pyargsort there). — Alfe, Sep 11 '12 at 14:21

Sorting based on one of the list among Nested list in python

4 Answers4