11

Here's a example of what I want to do

spam_list = ["We", "are", "the", "knights", "who", "say", "Ni"]
spam_order = [0,1,2,4,5,6,3]
spam_list.magical_sort(spam_order)
print(spam_list)

["We", "are", "the", "who", "say", "Ni", "knights"]

I can do it with enumerate, list and so on, but I would like to directly affect spam_list, like list.sort() and not copy it like sorted()

Edit : pushed a string example to avoid confusion between indices and values of spam_list

Edit : turned out this is a duplicate of Python sort parallel arrays in place?. Well, I can't delete so much efforts for SO consistency arguments.

Community
  • 1
  • 1
Evpok
  • 4,273
  • 3
  • 34
  • 46
  • I assume you will also not accept solutions which make a copy of `spam_order`? – ninjagecko May 23 '11 at 14:06
  • Well, this I can stand. `spam_order` is just a list of `int`s, it's not that much memory-expensive. – Evpok May 23 '11 at 14:27
  • This is an interesting problem because, depending on the context, the real solution may lie upstream. I can think of three cases: 1) the sort indices vary and are independent of the data, 2) the sort indices are fixed and independent of the data, and 3) the sort indices are derived from the data somewhere upstream. Each of the three suggests different sets of solutions for live code. – TechNeilogy May 23 '11 at 14:53
  • My issue here is with 2). But yeah, others are interesting, too. – Evpok May 23 '11 at 15:04
  • Most of the solutions here are very good, but solve the most general case, i.e. #1. If the index order is fixed over a large number of variable lists that must be sorted, there may be more efficient solutions that take advantage of that fact. – TechNeilogy May 23 '11 at 15:09
  • @TechNeilogy If you have one, I am very interested. This sort is supposed to be made on thousands of lists of quite long strings. – Evpok May 23 '11 at 15:12
  • Why would copying `spam_order` be okay but not copying `spam_list`? They are both lists of an equal number of references. The strings are independent objects, so their lengths does not matter to the list size. – Yann Vernier May 23 '11 at 16:27
  • @Yann Vernier Well, I had forgotten this fact. Seems that most of my concerns were pointless. Thank you. – Evpok May 23 '11 at 16:35
  • I think Yann's idea is good. It may be more efficient just to copy the whole. If not, (i.e. the lists themselves are very, very, very long) I was going to suggest swapping the spam references using order as an index. – TechNeilogy May 23 '11 at 16:43

7 Answers7

24

You could try:

spam_list = [spam_list[i] for i in spam_order]
Peter Collingridge
  • 10,849
  • 3
  • 44
  • 61
  • Well good one. One line, eases my messing with `zip` and the like, but still not exactly what I wanted. Thanks though =) – Evpok May 23 '11 at 14:15
  • 1
    I prefer this solution, which could be made more in-place with slice assignment -- though a copy would still be generated. – senderle May 23 '11 at 14:58
  • 7
    Personally I'd probably `spam_list = map(spam_list.__getitem__, spam_order)`, which works about the same but does less lookups. – Yann Vernier May 23 '11 at 16:30
  • 1
    @Yann Vernier, interesting point -- a quick test confirms that your suggestion is faster. (But, perhaps not surprisingly, the [lambda expression version](http://stackoverflow.com/questions/6098250/nice-way-to-inverse-sort-permute-a-list-using-a-list-of-indices-in-place/6098372#6098372) is slower.) – senderle May 23 '11 at 17:00
  • @Yann - Nice - I hadn't considered that. – Peter Collingridge May 23 '11 at 17:01
  • 1
    @Yann Vernier This is the best answer so far and I'd accept it if you post it as a separate answer. – Evpok May 23 '11 at 20:38
  • 2
    This isn't in place at all, it creates a new list. – augurar Apr 29 '16 at 08:22
11

You can give a special key to the sort function:

order = dict(zip(spam_list, spam_order))
spam_list.sort(key=order.get)

Edit: As @ninjagecko points out in his answer, this is not really efficient, as it copies both lists to create the dictionary for the lookup. However, with the modified example given by the OP, this is the only way, because one has to build some index. The upside is that, at least for the strings, the values will not be copied, so the overhead is just that of the dictionary itself.

Community
  • 1
  • 1
Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
  • The former suits me perfectly, latter is something else. Both are pretty good though. Thanks a lot. – Evpok May 23 '11 at 14:16
  • +1 for actually using the sort() function. I didn't know you could specify a key like this. – Peter Collingridge May 23 '11 at 14:29
  • Okay then, jump to @ninjagecko's solution. – Evpok May 23 '11 at 14:29
  • @Evpok: I don't think that @ninjagecko's solution offers any benefit here. You need some index-structure to lookup the order of the items, and that requires space. A `dict` is a very efficient index-structure, so you might as well use that. – Björn Pollex May 23 '11 at 14:32
  • @Space_C0wb0y This is really confusing, aren't Python's names just references that can be modified without actually copying the object they point at? (Don't know if this sentence makes sense) – Evpok May 23 '11 at 14:39
  • @Evpok: Yes they are, but those references take up space too. Therefore, this solution does not require constant space, because the dictionary has to be built. – Björn Pollex May 23 '11 at 14:43
  • @Space_C0wb0y Ah, okay. Well I think I can stand using space for more references. In fact my lists aren't that long, their elements are bigs, though, that's why I wanted to avoid deep copy. – Evpok May 23 '11 at 14:47
  • @Evpok: if that is the case, I would recommend this solution, but wrapped in the form of a function `def sortInPlace(toSort, keys):...` so you could say `sortInPlace(spam_values, spam_order)` – ninjagecko May 23 '11 at 15:47
  • Then it will be my choice. Thank you all, you've been of great help. – Evpok May 23 '11 at 16:36
  • 1
    After some tests and profiling, it appears that this solution uses twice more call than `map(spam_list.__getitem__, spam_order)` and doesn't save that much memory, so I change cmy choice (hopefully) for the last time. – Evpok May 23 '11 at 20:35
5

but I would like to directly affect spam_list, like list.sort() and not copy it like sorted()

There is ONLY ONE SOLUTION, that does exactly what you ask. Every single other solution is implicitly making a copy of one or both lists (or turning it into a dict, etc.). What you are asking for is a method which sorts two lists in-place, using O(1) extra space, using one list as the keys of the other. I personally would just accept the extra space complexity, but if you really wanted to, you could do this:

(edit: it may be the case that the original poster doesn't really care about .sort because it's efficient, but rather because it modifies state; in general this is a dangerous thing to want and non-low-level languages attempt to avoid this and even ban it, but the solutions which use slice assignment will achieve "in-place" semantics)

  • Create a custom dictionary subclass (effectively a Zip class) which is backed by both lists you are sorting.
  • Indexing myZip[i] -> results in the tuple (list1[i],list2[i])
  • Assignment myZip[i]=(x1,x2) -> dispatches into list1[i]=x1, list2[i]=x2.
  • Use that to do myZip(spam_list,spam_order).sort(), and now both spam_list and spam_order are sorted in-place

Example:

#!/usr/bin/python3

class LiveZip(list):
    def __init__(self, list1, list2):
        self.list1 = list1
        self.list2 = list2

    def __len__(self):
        return len(self.list1)

    def __getitem__(self, i):
        return (self.list1[i], self.list2[i])

    def __setitem__(self, i, tuple):
        x1,x2 = tuple
        self.list1[i] = x1
        self.list2[i] = x2

spam_list = ["We", "are", "the", "knights", "who", "say", "Ni"]
spam_order = [0,1,2,4,5,6,3]

#spam_list.magical_sort(spam_order)
proxy = LiveZip(spam_order, spam_list)

Now let's see if it works...

#proxy.sort()
#fail --> oops, the internal implementation is not meant to be subclassed! lame
# It turns out that the python [].sort method does NOT work without passing in
# a list to the constructor (i.e. the internal implementation does not use the
# public interface), so you HAVE to implement your own sort if you want to not
# use any extra space. This kind of dumb. But the approach above means you can 
# just use any standard textbook in-place sorting algorithm:
def myInPlaceSort(x):
    # [replace with in-place textbook sorting algorithm]

NOW it works:

myInPlaceSort(proxy)

print(spam_list)

Unfortunately there is no way to just sort one list in O(1) space without sorting the other; if you don't want to sort both lists, you might as well do your original approach which constructs a dummy list.

You can however do the following:

spam_list.sort(key=lambda x:x)

but if the key or cmp functions makes any references to any collection (e.g. if you pass in a dict.__getitem__ of a dict you had to construct) this is no better than your original O(N)-space approach, unless you already happened to have such a dictionary lying around.

Turns out this is a duplicate question of Python sort parallel arrays in place? , but that question also had no correct answers except this one , which is equivalent to mine but without the sample code. Unless you are incredibly optimized or specialized code, I'd just use your original solution, which is equivalent in space complexity to the other solutions.

edit2: As senderle pointed out, the OP doesn't want a sort at all, but rather wishes to, I think, apply a permutation. To achieve this, you can and SHOULD use simply indexing that other answers suggest [spam_list[i] for i in spam_order], but an explicit or implicit copy must be made still because you still need the intermediate data. (Unrelated and for the record, applying the inverse permutation is I think the inverse of parallel sorting with the identity, and you can use one to get the other, though sorting is less time-efficient. _,spam_order_inverse = parallelSort(spam_order, range(N)), then sort by spam_order_inverse. I leave the above discussion about sorting up for the record.)

edit3:

It is possible, however, to achieve an in-place permutation in O(#cycles) space but with terrible time efficiency. Every permutation can be decomposed into disjoint permutations applied in parallel on subsets. These subsets are called cycles or orbits. The period is equal to their size. You thus take a leap of faith and do as follows:

Create a temp variable.

For index i=0...N:
    Put x_i into temp, assign NULL to x_i
    Swap temp with x_p(i)
    Swap temp with x_p(p(i))
    ...
    Swap temp with x_p(..p(i)..), which is x_i
    Put a "do not repeat" marker on the smallest element you visited larger than i
    Whenever you encounter a "do not repeat" marker, perform the loop again but
      without swapping, moving the marker to the smallest element larger than i    
    To avoid having to perform the loop again, use a bloom filter

This will run in O(N^2) time and O(#cycles) place without a bloom filter, or ~O(N) time and O(#cycle + bloomfilter_space) space if you use them

Community
  • 1
  • 1
ninjagecko
  • 88,546
  • 24
  • 137
  • 145
  • Well @Space_c0wb0y's solution above doesn't seems to make a copy of any list. – Evpok May 23 '11 at 14:22
  • @Evpok: Yes it does, and I edited my answer accordingly. Thanks @ninjagecko for pointing that out. – Björn Pollex May 23 '11 at 14:25
  • @Space_C0wb0y Oh, I thought that `zip` and `dict` used references instead of actually copying the values. But isn't there an easier way since I only want to sort `spam_list` and I don't mind `spam_order` being kept this way? – Evpok May 23 '11 at 14:31
  • 2
    @ninjagecko: Your solution is very unclear to me. How does it work in O(1) space? – Björn Pollex May 23 '11 at 14:33
  • one moment, I have sample code coming up =) in `O(1)` extra space – ninjagecko May 23 '11 at 14:36
  • 2
    @Evpok: Yes, Python uses references. When you copy a `list` or `dict`, the actual values will not be copied, just the references. – Björn Pollex May 23 '11 at 14:36
  • @Evpok: it is true that they use references, but a reference is basically the same as an `int` in terms of memory, so it's like you're making a duplicate array of ints. There's no reason to shy away from such solutions though; such tradeoffs happen all the time behind the scenes without the programmer's knowledge. – ninjagecko May 23 '11 at 15:43
  • 1
    @ninjagecko, I don't think this does what the OP wants. This does a parallel sort, but what the OP asked for was a -- I don't know what to call it exactly -- a "transformation sort". In other words, the OP wants `["We", "are", "the", "who", "say", "Ni", "knights"]`, but this gives `['We', 'are', 'the', 'Ni', 'knights', 'who', 'say']`. – senderle May 23 '11 at 15:50
  • @senderle: thank you for pointing this out. That makes things easier, though the issue I brought up still applies if the OP cares about it. For the record this is the inverse permutation and is done by doing `identity,spam_order_inverse = parallelSort(spam_order, range(N))`, then sorting by `spam_order_inverse`, of if your keys are already integers, doing the indexing thing other answers have. I shall update answer. – ninjagecko May 23 '11 at 16:17
4

If the issue is specifically in-placeness and not memory usage per se -- if you want this to have side effects, in other words -- then you could use slice assignment. Stealing from Peter Collingridge:

other_spam_list = spam_list
spam_list[:] = [spam_list[i] for i in spam_order]
assert other_spam_list == spam_list

It seems you might even be able to do this with a generator expression! But I suspect this still implicitly creates a new sequence of some sort -- probably a tuple. If it didn't, I think it would exhibit wrong behavior; but I tested it, and its behavior seemed correct.

spam_list[:] = (spam_list[i] for i in spam_order)

Aha! See this excellent answer by the inimitable Sven Marnach -- generator slice assignment does indeed generate an implicit tuple. Which means it's safe, but not as memory efficient as you might think. Still, tuples are more memory efficient than lists, so the generator expression is preferable from that perspective.

Community
  • 1
  • 1
senderle
  • 145,869
  • 36
  • 209
  • 233
  • 1
    Hmm, you may be right that the OP doesn't really care about efficiency and actually wants just in-place semantics. That's sort of strange, because there are entire programming languages dedicated to avoiding in-place semantics... but it may indeed be what he wants, and if so, a function which uses slices is definitely the way to go. – ninjagecko May 23 '11 at 15:51
  • feel free to copy any parts of http://stackoverflow.com/questions/6098250/is-there-a-nice-way-to-sort-a-list-using-a-list-of-indices/6099818#6099818 into your answer; I clarified a bit but did not want to edit your post without permission – ninjagecko May 23 '11 at 16:04
  • Stole and improved upon. Thanks for pointing out slice assignment. I'd not seen that before. – Peter Collingridge May 23 '11 at 16:18
  • In fact I don't care about in-place semantics and just want efficiency, but thank you any way =) – Evpok May 23 '11 at 16:27
2
map(lambda x:spam_list[x], spam_order)
Vader
  • 3,675
  • 23
  • 40
2

If you actually don't care about the efficiency at all, and just want in-place semantics (which is a bit odd, because there are entire programming languages dedicated to avoiding in-place semantics), then you can do this:

def modifyList(toModify, newList):
    toModify[:] = newList

def permuteAndUpdate(toPermute, permutation):
    modifyList(toPermute, [toPermute[i] for i in permutation])

permuteAndUpdate(spam_list, spam_order)

print(spam_list)
# ['We', 'are', 'the', 'Ni', 'knights', 'who', 'say']

Credit goes to senderle for recognizing that this is what the OP may actually be after; he should feel free to copy this answer into his own. Should not accept this answer unless you really prefer it over his.

ninjagecko
  • 88,546
  • 24
  • 137
  • 145
  • Feeling too lazy to edit my post, but I won't be offended if you do. :) – senderle May 23 '11 at 16:40
  • By the way, it never occurred to me before to think of permutation application as inverse sorting, but of course that's exactly what it is -- thanks for that insight. – senderle May 23 '11 at 16:53
0

You may use numpy.

import numpy as np
spam_list = list(np.array(spam_list)[spam_order])
user31264
  • 6,557
  • 3
  • 26
  • 40