2

I have a large list of integers unsorted, numbers might be duplicated. I would like to create another list which is a list of sub-lists of indexes from the first list starting with max element to min, in decreasing order. For example, if I have a list like this:

list = [4, 1, 4, 8, 5, 13, 2, 4, 3, 7, 14, 4, 4, 9, 12, 1, 6, 14, 10, 8, 6, 4, 11, 1, 2, 11, 3, 9]

The output should be:

indexList = [[10, 17], [5], [14], [22, 25], [18], [13, 27], [3, 19], [9], [16, 20], [4], [0, 2, 7, 11, 12, 21], [8, 26], [6, 24], [1, 15, 23]]

where, [10, 17] is the index of where '14' is present and so on...

Shared my code below. Profiling it using cProfile for a list of around 9000 elements takes around ~6 seconds.

def indexList(list):
    # List with sorted elements
    sortedList = sorted(list, reverse = True)

    seen = set()
    uSortedList = [x for x in sortedList if x not in seen and not seen.add(x)]

    indexList = []
    for e in uSortedList:
        indexList.append([i for i, j in enumerate(list) if j == e])

    return indexList
pzp
  • 6,249
  • 1
  • 26
  • 38
Learning
  • 31
  • 5

1 Answers1

3

Here you go:

def get_list_indices(ls):
    indices = {}
    for n, i in enumerate(ls):
        try:
            indices[i].append(n)
        except KeyError:
            indices[i] = [n]
    return [i[1] for i in sorted(indices.items(), reverse=True)]

test_list = [4, 1, 4, 8, 5, 13, 2, 4, 3, 7, 14, 4, 4, 9, 12, 1, 6, 14, 10, 8, 6, 4, 11, 1, 2, 11, 3, 9]
print(get_list_indices(test_list))

Based on some very basic testing, it is about twice as fast as the code you posted.

pzp
  • 6,249
  • 1
  • 26
  • 38
  • This is exactly what I would do. Writing the indices in a dictionary values and then printing them sorted by the keys. The only change I'd do is related to the code and I'm not sure if is better - instead of expecting an error, I would check `if i not in indices -> then indices[i] = []` – SomethingSomething Sep 11 '15 at 22:19
  • 2
    You can use `indices.setdefault(i, []).append(n)` to avoid the try-except, and instead of `indices.items()` we can simply use `indices`. – Ashwini Chaudhary Sep 11 '15 at 22:20
  • @pzp, tested your code and yes its around twice faster than mine, thanks, can this be improved further? – Learning Sep 11 '15 at 22:21
  • 1
    @SomethingSomething In Python it is better to use a ``try... except...`` than an if statement. This is related to the concept of [EAFP](https://docs.python.org/2/glossary.html#term-eafp) (Easier to Ask for Forgiveness than Permission) vs. [LBYL](https://docs.python.org/2/glossary.html#term-lbyl) (Look Before You Leap), which you can find out more about in this [SO question](https://stackoverflow.com/questions/11360858/what-is-the-eafp-principle-in-python). – pzp Sep 11 '15 at 22:23
  • I think it's the best possible solution. It iterates your list only once, sorts the keys in O(nlogn) and prints them according to the order. I can't think about anything faster. Your input must be very very large if the speed does not satisfy you – SomethingSomething Sep 11 '15 at 22:23
  • @pzp, according to the example in your link, I don't think my solution is slower. To get to the exception, it also has to first look for the key in the dictionary. Anyway, it's nice to know another coding style – SomethingSomething Sep 11 '15 at 22:29
  • @SomethingSomething yes, my input list is quite large, but the solution is working within the time for now. – Learning Sep 11 '15 at 22:31
  • @SomethingSomething But it only has to catch the exception if the item does not exist (in which case it's "wasting" a lookup), but if it already does it just goes on doing its business. LBYL requires a "wasted" lookup regardless of whether or not the element is already in the dictionary. When working with a large list with a lot of duplicates, as the OP is, it can save quite a bit of lookups. In addition (I don't know if the link I gave you discussed this), EAFP also helps to prevent racetime conditions and is generally considered more Pythonic. – pzp Sep 11 '15 at 22:35