Efficient way to find index of elements in a large list of integers starting with max to min elements

Question

I have a large list of integers unsorted, numbers might be duplicated. I would like to create another list which is a list of sub-lists of indexes from the first list starting with max element to min, in decreasing order. For example, if I have a list like this:

list = [4, 1, 4, 8, 5, 13, 2, 4, 3, 7, 14, 4, 4, 9, 12, 1, 6, 14, 10, 8, 6, 4, 11, 1, 2, 11, 3, 9]

The output should be:

indexList = [[10, 17], [5], [14], [22, 25], [18], [13, 27], [3, 19], [9], [16, 20], [4], [0, 2, 7, 11, 12, 21], [8, 26], [6, 24], [1, 15, 23]]

where, [10, 17] is the index of where '14' is present and so on...

Shared my code below. Profiling it using cProfile for a list of around 9000 elements takes around ~6 seconds.

def indexList(list):
    # List with sorted elements
    sortedList = sorted(list, reverse = True)

    seen = set()
    uSortedList = [x for x in sortedList if x not in seen and not seen.add(x)]

    indexList = []
    for e in uSortedList:
        indexList.append([i for i, j in enumerate(list) if j == e])

    return indexList

[How do I ask a good question?](http://stackoverflow.com/help/how-to-ask) — Ashwini Chaudhary, Sep 11 '15 at 21:46
@AshwiniChaudhary, apologize if my question is not clearly asked, asking a question for the first time and am learning. thanks. — Learning, Sep 11 '15 at 22:14
Are you using Python 2 or Python 3? Knowing that will allow me to make further performance improvements. — pzp, Sep 11 '15 at 22:18
@Learning Nice, thanks for updating the question and welcome to SO. :-) — Ashwini Chaudhary, Sep 11 '15 at 22:21
I think it is a good question. It is more related to data structures than to Python. — SomethingSomething, Sep 11 '15 at 22:21
Any interest in @Learning [numpy](http://www.numpy.org)? For long lists (10k), this is about 4 times faster than @pzp's code: `np.split(np.argsort(L), np.cumsum(np.bincount(L)))[::-1]` (Less efficient of an algorithm, but implemented in C by folks smarter than I.) — askewchan, Sep 11 '15 at 22:56

score 3 · Answer 1 · answered Sep 11 '15 at 22:12

3

Here you go:

def get_list_indices(ls):
    indices = {}
    for n, i in enumerate(ls):
        try:
            indices[i].append(n)
        except KeyError:
            indices[i] = [n]
    return [i[1] for i in sorted(indices.items(), reverse=True)]

test_list = [4, 1, 4, 8, 5, 13, 2, 4, 3, 7, 14, 4, 4, 9, 12, 1, 6, 14, 10, 8, 6, 4, 11, 1, 2, 11, 3, 9]
print(get_list_indices(test_list))

Based on some very basic testing, it is about twice as fast as the code you posted.

answered Sep 11 '15 at 22:12

pzp

6,249
1
26
38

This is exactly what I would do. Writing the indices in a dictionary values and then printing them sorted by the keys. The only change I'd do is related to the code and I'm not sure if is better - instead of expecting an error, I would check `if i not in indices -> then indices[i] = []` – SomethingSomething Sep 11 '15 at 22:19
2

You can use `indices.setdefault(i, []).append(n)` to avoid the try-except, and instead of `indices.items()` we can simply use `indices`. – Ashwini Chaudhary Sep 11 '15 at 22:20
@pzp, tested your code and yes its around twice faster than mine, thanks, can this be improved further? – Learning Sep 11 '15 at 22:21
1

@SomethingSomething In Python it is better to use a ``try... except...`` than an if statement. This is related to the concept of [EAFP](https://docs.python.org/2/glossary.html#term-eafp) (Easier to Ask for Forgiveness than Permission) vs. [LBYL](https://docs.python.org/2/glossary.html#term-lbyl) (Look Before You Leap), which you can find out more about in this [SO question](https://stackoverflow.com/questions/11360858/what-is-the-eafp-principle-in-python). – pzp Sep 11 '15 at 22:23
I think it's the best possible solution. It iterates your list only once, sorts the keys in O(nlogn) and prints them according to the order. I can't think about anything faster. Your input must be very very large if the speed does not satisfy you – SomethingSomething Sep 11 '15 at 22:23
@pzp, according to the example in your link, I don't think my solution is slower. To get to the exception, it also has to first look for the key in the dictionary. Anyway, it's nice to know another coding style – SomethingSomething Sep 11 '15 at 22:29
@SomethingSomething yes, my input list is quite large, but the solution is working within the time for now. – Learning Sep 11 '15 at 22:31
@SomethingSomething But it only has to catch the exception if the item does not exist (in which case it's "wasting" a lookup), but if it already does it just goes on doing its business. LBYL requires a "wasted" lookup regardless of whether or not the element is already in the dictionary. When working with a large list with a lot of duplicates, as the OP is, it can save quite a bit of lookups. In addition (I don't know if the link I gave you discussed this), EAFP also helps to prevent racetime conditions and is generally considered more Pythonic. – pzp Sep 11 '15 at 22:35

Efficient way to find index of elements in a large list of integers starting with max to min elements

1 Answers1