4

Desired Output

I want a function to return a list such that, given a "jumbled" list l, each element is the index of the corresponding element of l, if l was sorted. (I'm failing to think of a less convoluted way of saying this, sorry.)

Examples

f([3,1,2]) = [2,0,1]

f([3,1,2,2,3]) = [3,0,1,2,4], since the input sorted is [1,2,2,3,3].

(This is useful for some stats calculations.)

My Attempt

I came up with a way to do this function, but this is python- it seems like there should be a one-liner to do this, or at least a much cleaner, clearer way.

def getIndiciesInSorted(l):
    sortedL = sorted(l)
    outputList = []
    for num in l:
        sortedIndex = sortedL.index(num)
        outputList.append(sortedIndex)
        sortedL[sortedIndex] = None
    return outputList

l=[3,1,2,2,3] 
print getIndiciesInSorted(l)

So, how can I write this more concisely? Is there a legible list comprehension solution?

Matthew Adams
  • 9,426
  • 3
  • 27
  • 43
  • Note that this is not the same as `sorted(range(len(l)), key=lambda i: l[i])`... (Though that was my first thought.) – Matthew Adams Sep 14 '12 at 00:18
  • @nightcracker I know, your answer was my first attempt as well, so I thought I'd leave a comment to let other people know that that method was a no-go. – Matthew Adams Sep 14 '12 at 00:22

5 Answers5

5
def argsort(seq):
    # http://stackoverflow.com/questions/3382352/3382369#3382369
    # http://stackoverflow.com/questions/3071415/3071441#3071441
    '''
    >>> seq=[1,3,0,4,2]
    >>> index=argsort(seq)
    [2, 0, 4, 1, 3]

    Given seq and the index, you can construct the sorted seq:
    >>> sorted_seq=[seq[x] for x in index]
    >>> assert sorted_seq == sorted(seq)

    Given the sorted seq and the index, you can reconstruct seq:
    >>> assert [sorted_seq[x] for x in argsort(index)] == seq
    '''
    return sorted(range(len(seq)), key=seq.__getitem__)

def f(seq):
    idx = argsort(seq)
    return argsort(idx)

print(f([3,1,2]))
# [2, 0, 1]

print(f([3,1,2,2,3]))
# [3, 0, 1, 2, 4]

Note that nightcracker's function is faster:

def get_sorted_indices(l):
    sorted_positions = sorted(range(len(l)), key=l.__getitem__)
    result = [None for _ in range(len(l))]
    for new_index, old_index in enumerate(sorted_positions):
        result[old_index] = new_index
    return result

The difference may be significant for long lists:

In [83]: import random
In [98]: l = [random.randrange(100) for _ in range(10000)]
In [104]: timeit get_sorted_indices(l)
100 loops, best of 3: 4.73 ms per loop

In [105]: timeit f(l)
100 loops, best of 3: 6.64 ms per loop
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    This sorts twice, if speed is a factor my version _might_ be faster (on the other hand, the sorting code is coded in C, so that might actually make yours faster). – orlp Sep 14 '12 at 00:31
4

This is the best I came up with:

def get_sorted_indices(l):
    sorted_positions = sorted(range(len(l)), key=l.__getitem__)
    result = [None for _ in range(len(l))]

    for new_index, old_index in enumerate(sorted_positions):
        result[old_index] = new_index

    return result

It's faster than your solution, but that's about it.

orlp
  • 112,504
  • 36
  • 218
  • 315
  • +1: Indeed, your method is faster, especially for longer lists. (PS. I think `[None for _ in len(l)]` needs to be `[None for _ in range(len(l)]`. Also, the code is a bit faster if you change `key = lambda ...` to `key = l.__getitem__`. ) – unutbu Sep 14 '12 at 01:16
  • @unutbu: incorporated the changes you suggested. – orlp Sep 14 '12 at 01:40
  • Or just `result = [None] * len(l)` – Warren Weckesser Sep 14 '12 at 03:49
  • @Warren That's more succinct, but list multiplication is very often a bad idea. Even though it would be fine in this case, on a Q&A site where there's lots of new Pythonistas around I tend to think it's better not to demonstrate list multiplication, lest someone try to generalise it and end up contributing to the enormous mountain of Python SO questions to do with shared-references in Python. – Ben Sep 14 '12 at 04:30
2

There's a one-line comprehension but it's really ugly:

>>> E, S = enumerate, sorted
>>> l = [3,1,2,2,3]
>>> [a for _,a in S((a,b) for b,(_,a) in E(S((a,b) for b,a in E(l))))]
[3, 0, 1, 2, 4]

Unutbu's answer is easier to read and generates less garbage.

Community
  • 1
  • 1
Gareth Rees
  • 64,967
  • 9
  • 133
  • 163
2
k = [3, 0, 1, 2, 4]
initial = dict(zip(k, range(len(k)))) #{0: 1, 1: 2, 2: 3, 3: 0, 4: 4}
sorted_initial = dict(zip(sorted(k), range(len(k)))) #{0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
initial.update(sorted_initial) #{0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
result = [initial[i] for i in k] #[3, 0, 1, 2, 4]
thikonom
  • 4,219
  • 3
  • 26
  • 30
2

If you are doing statistical calculations, you will probably start using numpy at some point. With numpy, you can use the existing implementation of argsort:

>>> from numpy import array
>>> x = array([3, 1, 2, 2, 3])
>>> x.argsort().argsort()
array([3, 0, 1, 2, 4])

That's a numpy version of unutbu's answer. nightcracker's answer can be implemented as

>>> from numpy import array, empty_like, arange
>>> x = array([3, 1, 2, 2, 3])
>>> s = x.argsort()
>>> r = empty_like(s)
>>> r[s] = arange(x.size)
>>> r
array([3, 0, 1, 2, 4])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214