2

So I have to loop through a list of objects, using some of their values to do computation, and then assign them new values.

Because many of the items in the list will be assigned the same new value, I used a dictionary to hold the list of items that will require the same value. For example:

item_dict = {}

for item in list:
    value = item.value
    if value not in item_dict:
        item_dict[value] = [item]
    else:
        item_dict[value].append(item)

# do some calculations base on values

new_data # some dictionary created by computation
# new data is stored new_data[value] = new_value

for value, new_value in new_data.items():
    items = item_dict[value]
    for item in items:
        item.value = new_value

I was think about removing the for item in items loop with a decorator since all the new_value(s) for that list are the same. For example:

def dec(item):
    def wrap(value):
        item.value = value
    return wrap

def rec(item, func):
    def wrap(value):
        item.value = value
        func(value)
    return wrap

item_dict = {}

for item in list:
    value = item.value
    if value not in item_dict:
        item_dict[value] = dec(item)
    else:
        item_dict[value] = rec(item, item_dict[value])

# do some calculations base on values

new_data # some dictionary created by computation
# new data is stored new_data[value] = new_value

for value, new_value in new_data.items():
    items = item_dict[value]
    items(new_value)

Would the decorator fashion be more efficient and how much of a memory impact will it have? Are there any better ways of doing this?

Derek
  • 11,980
  • 26
  • 103
  • 162
  • The `rec` function, at a glance, looks like a no op. What's its purpose? – Eric May 30 '13 at 22:26
  • whoops, let me fix that – Derek May 30 '13 at 22:27
  • 1
    Do you need `items_dict` to do your calculation? If you just need to update all the item values based on `new_data` I think it would be clearer to just iterate through your list once, looking up the new value each time. Dictionary lookups are fast. – Peter DeGlopper May 30 '13 at 22:30
  • @PeterDeGlopper I'm actually storing the id values of generic objects from items to perform a query like GenericModel.objects.filter(id__in=values), So I need to iterate through all the items to get all the ids first so I can perform the query. So the new_data would be new_data[content_type] = [list of queried content objects] – Derek May 30 '13 at 22:34
  • Do you mean `item_dict` rather than `new_data` there? Your original code has just a single value in `new_data` for each content type. I still think building `item_dict` is unnecessary - you're doing at least one dictionary insert per item so that you can cut down on the number of dictionary lookups later, but a straight iteration through your list of items would do only one dictionary lookup per item. – Peter DeGlopper May 30 '13 at 22:46
  • the example above isn't structurally the same as what I'm using it for. I described the queryset example to explain why I needed the `item_dict`, but, of course, the example `item_dict` and the queryset `item_dict` are not the same. I thought using a simpler `item_dict` would be better for an example. – Derek May 30 '13 at 22:54

2 Answers2

1

A defaultdict works well here:

from collections import defaultdict

item_dict = defaultdict(list)

for item in value_list:
    item_dict[item.value].append(item)

# do some calculations base on values

new_data # some dictionary created by computation
# new data is stored new_data[value] = new_value

for value, new_value in new_data.items():
    for item in item_dict[value]:
        item.value = new_value

I struggle to think of a way the decorator version could be better - for one thing, you have to worry about the recursion limit.

Eric
  • 95,302
  • 53
  • 242
  • 374
  • can you please explain why it is better than the decorator method, i.e. how much of a performance boost and why? thanks – Derek May 30 '13 at 22:31
  • @Derek: The decorator method obviously requires more memory, as you're creating a new function object with closures every loop. My answer doesn't really differ in performance from your iteration solution - it just is a more concise way to express it. – Eric May 30 '13 at 22:34
0

The get method works well in the first case.

item_dict = {}

for item in list:
    item_dict[item.value] = item_dict.get(item.value, []) + [item]

The key to making this work is to use list addition instead of append, as append returns None.

yardsale8
  • 940
  • 9
  • 15
  • 1
    If you're going to go that way, `set_default` is probably better. `item_dict.set_default(item.value, []).append(item)` – Peter DeGlopper May 30 '13 at 23:00
  • Could you elaborate? Is your method more efficient? Wouldn't you be setting the default value every time we see item.value? – yardsale8 May 30 '13 at 23:02
  • You are correct, for more information see [this question](http://stackoverflow.com/questions/7423428/python-dict-get-vs-setdefault) – yardsale8 May 31 '13 at 02:09