0

I have a list of dictionaries=

a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4},...]

"ID" is a unique identifier for each dictionary. Considering the list is huge, what is the fastest way of checking if a dictionary with a certain "ID" is in the list, and if not append to it? And then update its "VALUE" ("VALUE" will be updated if the dict is already in list, otherwise a certain value will be written)

alwbtc
  • 28,057
  • 62
  • 134
  • 188
  • 3
    Why is `a` a list and not a dictionary itself? – Martijn Pieters Jul 15 '14 at 16:33
  • 1
    Can you not convert the list into a dict of `{ID: VALUE}` pairs? – wflynny Jul 15 '14 at 16:33
  • I don't know, isn't using a list more convenient? – alwbtc Jul 15 '14 at 16:33
  • 2
    No, a dictionary is far more efficient and convenient to map unique identifiers to other objects. A list requires scanning sequentially each time. – Martijn Pieters Jul 15 '14 at 16:34
  • How would you write the ID of an inner dict then? – alwbtc Jul 15 '14 at 16:35
  • 1
    @alwbtc: do your dictionaries contain **just** an `ID` and `VALUE` key? then you don't need inner dictionaries at all, you'd use one dictionary instead. – Martijn Pieters Jul 15 '14 at 16:36
  • It may contain more keys. I still don't understand how to use one dictionary. Think of each inner dictionary as a person, each has ID number, some money in bank account, and maybe the city they live in. – alwbtc Jul 15 '14 at 16:38
  • @alwbtc: then just map the id to the contained dictionary. That contained dictionary *doesn't have to have the id again*. – Martijn Pieters Jul 15 '14 at 16:41
  • Even then, why wouldn't you structure your data as a dict of dicts: `{1:{"VALUE":1, "MONEY":100,...}, 2:{"VALUE":2, "MONEY":0, ...}, ...}`. Then checking for whether the `ID` exists is trivial: `if id not in d: ...`. – wflynny Jul 15 '14 at 16:41

5 Answers5

5

You'd not use a list. Use a dictionary instead, mapping ids to nested dictionaries:

a = {
    1: {'VALUE': 2, 'foo': 'bar'},
    42: {'VALUE': 45, 'spam': 'eggs'},
}

Note that you don't need to include the ID key in the nested dictionary; doing so would be redundant.

Now you can simply look up if a key exists:

if someid in a:
    a[someid]['VALUE'] = newvalue

I did make the assumption that your ID keys are not necessarily sequential numbers. I also made the assumption you need to store other information besides VALUE; otherwise just a flat dictionary mapping ID to VALUE values would suffice.

A dictionary lets you look up values by key in O(1) time (constant time independent of the size of the dictionary). Lists let you look up elements in constant time too, but only if you know the index.

If you don't and have to scan through the list, you have a O(N) operation, where N is the number of elements. You need to look at each and every dictionary in your list to see if it matches ID, and if ID is not present, that means you have to search from start to finish. A dictionary will still tell you in O(1) time that the key is not there.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • OK, I thought dictionary is a slow array in python?? – alwbtc Jul 15 '14 at 16:44
  • dicts are much faster for certain operations: fetching is `O(1)` for example. See http://stackoverflow.com/questions/1418588/how-expensive-are-python-dictionaries-to-handle – wflynny Jul 15 '14 at 16:47
  • @alwbtc: what gave you that impression? Direct index lookups in a list are fast, but a key lookup in a dictionary is a O(1) operation *too*. If you have to search through a list, however, you don't have a direct index and you have to do a scan, a worst-case scenario of O(N) comparisons. – Martijn Pieters Jul 15 '14 at 16:47
2

If you can, convert to a dictionary as the other answers suggest, but in case you you have reason* to not change the data structure storing your items, here's what you can do:

items = [{"ID":1, "VALUE":2}, {"ID":2, "VALUE":2}, {"ID":3, "VALUE":4}]

def set_value_by_id(id, value):
    # Try to find the item, if it exists
    for item in items:
        if item["ID"] == id:
            break

    # Make and append the item if it doesn't exist
    else:  # Here, `else` means "if the loop terminated not via break"
        item = {"ID": id}
        items.append(id)

    # In either case, set the value
    item["VALUE"] = value

* Some valid reasons I can think of include preserving the order of items and allowing duplicate items with the same id. For ways to make dictionaries work with those requirements, you might want to take a look at OrderedDict and this answer about duplicate keys.

Community
  • 1
  • 1
interestinglythere
  • 1,230
  • 13
  • 16
0

Convert your list into a dict and then checking for values is much more efficient.

d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
    if new_key not in d:
        d[new_key] = new_value
wflynny
  • 18,065
  • 5
  • 46
  • 67
0

Also need to update on key found:

d = dict((item['ID'], item['VALUE']) for item in a)

for new_key, new_value in new_items:
    d.setdefault(new_key, 0)
    d[new_key] = new_value
gkusner
  • 1,244
  • 1
  • 11
  • 14
  • Why do you need to set the default value for `new_key` to `0`, only then to set the value to `new_value`? – wflynny Jul 15 '14 at 16:52
  • You're creating a lot of processing and memory overhead by creating a lot of tuples and doing a lot of potentially unnecessary VALUE lookups. – TessellatingHeckler Jul 15 '14 at 16:53
  • yes probably true i was thinking about the fact you always need to update with value - and setdefault is supposed to be efficient for possible adds - we have used this construct for years on very large data sets and it works very well / quickly – gkusner Jul 15 '14 at 16:55
0

Answering the question you asked, without changing the datastructure around, there's no real faster way of looking without a loop and checking every element and doing a dictionary lookup for each one - but you can push the loop down to the Python runtime instead of using Python's for loop.

I haven't tried if it ends up faster though.

a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4}]
id = 2

tmp = filter(lambda d: d['ID']==id, a)

# the filter will either return an empty list, or a list of one item. 
if not tmp:
    tmp = {"ID":id, "VALUE":"default"}
    a.append(tmp)
else:
    tmp = tmp[0]

# tmp is bound to the found/new dictionary
TessellatingHeckler
  • 27,511
  • 4
  • 48
  • 87