0

I need create a set of the IDs of some messages, and the positions in the original list. The code is used to sort the messages an later handle them according to ID.

The following works, is readable, but slow.

import numpy as np
IDs=np.array([354,45,45,34,354])#example, the actual array is huge

Dict={}
for counter in xrange(len(IDs)):
    try:
        Dict[IDs[counter]].append(counter)
    except:
        Dict[IDs[counter]]=[counter]
print(Dict)
#{354: [0, 4], 34: [3], 45: [1, 2]}

Any ideas how to speed it up? There is no need for the lists to be sorted. Later in the code is used as follows, and after that the dict is discarded

for item in Dict.values():
    Position_of_ID=Position[np.array(item)]
    ...
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
Okapi575
  • 710
  • 1
  • 10
  • 24

3 Answers3

1

Try to use defaultdict and enumerate:

from collections import defaultdict    
Dict = defaultdict(list)
for i,id in enumerate(IDs):
    Dict[id].append(i)

(using try and except is a bad idea if the exceptions aren't rare)

Community
  • 1
  • 1
Ohad Eytan
  • 8,114
  • 1
  • 22
  • 31
0

The fastest code I came up with, is this. It does much more math, is not as readable, and I am not proud, but it is a lot quicker (even with large arrays):

    Sorted_positions_of_IDs=np.argsort(IDs,kind='mergesort')
    SortedIDs=IDs[Sorted_positions_of_IDs]
    Position=0    
    Position_last=-1
    Dict={}
    while(Position<len(Sorted_positions_of_IDs)):
        ID=SortedIDs[Position]
        Position_last=np.searchsorted(SortedIDs,ID,side='right')
        Dict[ID]=Sorted_positions_of_IDs[Position:Position_last]
        Position=Position_last

Anyway, good Ideas will be appreciated.

Okapi575
  • 710
  • 1
  • 10
  • 24
0

Mutch faster is using a "dictcompression"

Dict = {id:i for i, id in enumerate(IDs)}
Okapi575
  • 710
  • 1
  • 10
  • 24