Create set and lists with the positions in the set efficently

Question

I need create a set of the IDs of some messages, and the positions in the original list. The code is used to sort the messages an later handle them according to ID.

The following works, is readable, but slow.

import numpy as np
IDs=np.array([354,45,45,34,354])#example, the actual array is huge

Dict={}
for counter in xrange(len(IDs)):
    try:
        Dict[IDs[counter]].append(counter)
    except:
        Dict[IDs[counter]]=[counter]
print(Dict)
#{354: [0, 4], 34: [3], 45: [1, 2]}

Any ideas how to speed it up? There is no need for the lists to be sorted. Later in the code is used as follows, and after that the dict is discarded

for item in Dict.values():
    Position_of_ID=Position[np.array(item)]
    ...

score 1 · Answer 1 · edited May 23 '17 at 10:31

1

Try to use defaultdict and enumerate:

from collections import defaultdict    
Dict = defaultdict(list)
for i,id in enumerate(IDs):
    Dict[id].append(i)

(using try and except is a bad idea if the exceptions aren't rare)

edited May 23 '17 at 10:31

Community

1
1

answered Aug 08 '16 at 10:14

Ohad Eytan

8,114
1
22
31

1

with my data that saves about a third on execution time, nice. – Okapi575 Aug 08 '16 at 11:58
But still there must be a way do this without the pure python loop, and probably without using dicts. – Okapi575 Aug 08 '16 at 12:30

score 0 · Answer 2 · answered Aug 09 '16 at 06:04

The fastest code I came up with, is this. It does much more math, is not as readable, and I am not proud, but it is a lot quicker (even with large arrays):

    Sorted_positions_of_IDs=np.argsort(IDs,kind='mergesort')
    SortedIDs=IDs[Sorted_positions_of_IDs]
    Position=0    
    Position_last=-1
    Dict={}
    while(Position<len(Sorted_positions_of_IDs)):
        ID=SortedIDs[Position]
        Position_last=np.searchsorted(SortedIDs,ID,side='right')
        Dict[ID]=Sorted_positions_of_IDs[Position:Position_last]
        Position=Position_last

Anyway, good Ideas will be appreciated.

score 0 · Accepted Answer · answered Apr 26 '17 at 07:03

0

Mutch faster is using a "dictcompression"

Dict = {id:i for i, id in enumerate(IDs)}

answered Apr 26 '17 at 07:03

Okapi575

710
1
10
24

Create set and lists with the positions in the set efficently

3 Answers3