1

I have a problem with Python dictionary performance.

I have to read a file that with many x,y location and if draw this point it looks like a circle , and i have to resize this circle by a specific number from the edge. If Mydict has over 50000 items, it becomes slow.

Mydict is like { (X,Y) : {value:1 ,value:2} ,}
Key is a tuple and value is another dict.

Mydict = get_data()
cut_num = 5 # cut size
# start remove x
max_group = tuple(map(max,*Mydict))
min_group = tuple(map(min,*Mydict))
X_max = max_group[0] - cut_num
X_min = min_group[0] + cut_num

remove_keys =[k for k in Mydict if k[0]> X_max or k[0]< X_min ]
for key in remove_keys:
    del Mydict[key]

Remove X value is faster just about 1 second.

But I don't know how to remove Y value in a efficiency way.

I write a function to get Y Max,Min By X value:

def Get_Y_By_Xaxis(Mydict,X_value):
    temp_dict = {k for k in Mydict if k[0] == X_value }
    max_group = tuple(map(max,*temp_dict ))
    min_group = tuple(map(min,*temp_dict ))
    return (min_group[1],max_group[1])

And start to remove Y value. I do it like this

for i in range(X_min ,X_max+1 ):
    limits = Get_Y_By_Xaxis(Mydict , i)
    remove_keys =[k for k in Mydict
                  if (k[0] == i and k[1] > limits[0] - cut_num  )
                  or (k[0] == i and k[1] < limits[1] + cut_num  )]
    for key in remove_keys:
        del Mydict[key]

The will cost about 60~70 second for a dictionary with 150000 item

Is this code will become faster or do I have to use some package to help me?

martineau
  • 119,623
  • 25
  • 170
  • 301
Khyas
  • 43
  • 2
  • Could you describe what you are trying to achieve? Your algorithm seems not to be optimal, because you scan many time the whole keys, but I cannot propose you a better way because I am unsure of the expected result. – Serge Ballesta Jan 06 '17 at 12:41
  • 2
    Your performance issues are with having to iterate through all keys, not with deletion. You using coordinate tuples as keys but then need to address [geometry computations](https://en.wikipedia.org/wiki/Computational_geometry). A dictionary is not the right data structure here. – Martijn Pieters Jan 06 '17 at 12:47
  • Find the maximum and minimum XY values and from the edge shrink it – Khyas Jan 06 '17 at 12:49
  • 1
    According to this the average complexity of deleting a key is `O(1)`: https://wiki.python.org/moin/TimeComplexity . Thus it is unlikely that the deletion is the bottle neck. Nevertheless, if you have a large number of keys to remove, it might make more sense to gather the keys to remove into a *set* and create a new dictionary without the keys. Something like `Mydict = {k:v for k,v in Mydict.items() if not k in remove_keys}` – John Coleman Jan 06 '17 at 12:50
  • I suggest you profile your script and see where it's spending most of its time. See [_How can you profile a Python script?_](http://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script) on this site. – martineau Jan 06 '17 at 13:52
  • Have you benchmarked against generating the equation for the circle and then just computing the value(s) of y for each value of x? Is there a reason why you have to store all the coordinates in memory? – snakecharmerb Jan 08 '17 at 22:56

0 Answers0