5

I'm building a class with amongst others a dictionary with integer keys and list values. Adding values to this dictionary seems to be a real bottleneck though and I was wondering whether there might be some way to speed up my code.

class myClass():

  def __init__(self):
    self.d = defaultdict(list)

  def addValue(self, index, value):
    self.d[index].append(value)

Is this really the optimal way of doing this? I don't really care about the order of the values, so perhaps there is a more suitable data structure out there with a faster append. Then again, 'append' doesn't seem to be the main problem, because if I simply append to an empty list, the code is a lot faster. I guess it's the loading of the previously stored list that takes up most of the time?


I found out that the problem is not in the dict, but in the list append (although I claimed otherwise in my original post, for which I apologize). This problem is due to a bug in Python's garbage collector, which is well explained on this other question. Disabling the gc before adding all the values and then re-enabling it, speeds up the process immensely!

Community
  • 1
  • 1
niefpaarschoenen
  • 560
  • 1
  • 8
  • 19
  • 2
    Adding items to a list and getting values from a object or a dict all take no time. To speed up a program you find the bottleneck by profiling, not by changing random pieces of code. – Jochen Ritzel Jun 20 '12 at 12:26
  • Is mapping items to existing keys significantly faster than adding values to new keys? – Adam Matan Jun 20 '12 at 12:35
  • I just found out that the problem is not in the dict, but in the list append (although I claimed otherwise in my original post, for which I apologize). Then I found the answer to my question on http://stackoverflow.com/questions/2473783/is-there-a-way-to-circumvent-python-list-append-becoming-progressively-slower. Since I'm new to this site, I don't know what the standard procedure is in this case: should I remove my original post? Or add the above details and answer to the post? – niefpaarschoenen Jun 20 '12 at 12:45

3 Answers3

2

Compare it to this:

class myClass():

  def __init__(self):
    self.d = {}

  def addValue(self, index, value):
    self.d.setdefault(index, []).append(value)
eumiro
  • 207,213
  • 34
  • 299
  • 261
1

They say "Better to ask for forgiveness than for permission.". Now you're not asking for permission personally, but I thought maybe defaultdict does, and that's what slowing it down.

try this:

class myClass():

  def __init__(self):
    self.d = {}

  def addValue(self, index, value):
    try:
        self.d[index].append(value)
    except KeyError:
        self.d[index] = [value]

This tries to access the index key in the dictionary, if it doesn't exist it will raise a KeyError, and act upon it.

Is it any faster?

jadkik94
  • 7,000
  • 2
  • 30
  • 39
  • I've tried to compare your code and code from question (using [timeit](http://docs.python.org/library/timeit.html)). I've used this test: `my = myClass() my.addValue(3, "ab") my.addValue(3, "cd") my.addValue(4, "ef") my.addValue(4, "gh")` And original code is faster! On my machine 24.66 usec for your code and 18.10 usec for code from question. So looks like this approach is not the answer.. – stalk Jun 20 '12 at 12:37
  • 1
    Seems that you have the fastest solution then :) – jadkik94 Jun 20 '12 at 12:42
0

As a conclusion I can say that my code in the original question is faster than or as fast as all the other suggestions.

niefpaarschoenen
  • 560
  • 1
  • 8
  • 19