6

I need a dictionary data structure that store dictionaries as seen below:

custom = {1: {'a': np.zeros(10), 'b': np.zeros(100)}, 
          2: {'c': np.zeros(20), 'd': np.zeros(200)}}

But the problem is that I iterate over this data structure many times in my code. Every time I iterate over it, I need the order of iteration to be respected because all the elements in this complex data structure are mapped to a 1D array (serialized if you will), and thus the order is important. I thought about writing a ordered dict of ordered dict for that matter, but I'm not sure this is the right solution as it seems I may be choosing the wrong data structure. What would be the most adequate solution for my case?

UPDATE

So this is what I came up with so far:

class Test(list):

    def __init__(self, *args, **kwargs):

        super(Test, self).__init__(*args, **kwargs)

        for k,v in args[0].items():
            self[k] = OrderedDict(v)

        self.d = -1
        self.iterator = iter(self[-1].keys())
        self.etype = next(self.iterator)
        self.idx = 0


    def __iter__(self):
        return self

    def __next__(self):

        try:
            self.idx += 1
            return self[self.d][self.etype][self.idx-1]

        except IndexError:

            self.etype = next(self.iterator)
            self.idx = 0
            return self[self.d][self.etype][self.idx-1]

    def __call__(self, d):

        self.d = -1 - d
        self.iterator = iter(self[self.d].keys())
        self.etype = next(self.iterator)
        self.idx = 0
        return self


def main(argv=()):

    tst = Test(elements)
    for el in tst:
        print(el)
    # loop over a lower dimension
    for el in tst(-2):
        print(el)

    print(tst)


    return 0

if __name__ == "__main__":
    sys.exit(main())

I can iterate as many times as I want in this ordered structure, and I implemented __call__ so I can iterate over the lower dimensions. I don't like the fact that if there isn't a lower dimension present in the list, it doesn't give me any errors. I also have the feeling that every time I call return self[self.d][self.etype][self.idx-1] is less efficient than the original iteration over the dictionary. Is this true? How can I improve this?

aaragon
  • 2,314
  • 4
  • 26
  • 60

4 Answers4

2

I think using OrderedDicts is the best way. They're built-in and relatively fast:

custom = OrderedDict([(1, OrderedDict([('a', np.zeros(10)),
                                       ('b', np.zeros(100))])),
                      (2, OrderedDict([('c', np.zeros(20)),
                                       ('d', np.zeros(200))]))])

If you want to make it easy to iterate over the contents of the your data structure, you can always provide a utility function to do so:

def iter_over_contents(data_structure):
    for delem in data_structure.values():
        for v in delem.values():
            for row in v:
                yield row

Note that in Python 3.3+, which allows yield from <expression>, the last for loop can be eliminated:

def iter_over_contents(data_structure):
    for delem in data_structure.values():
        for v in delem.values():
            yield from v

With one of those you'll then be able to write something like:

for elem in iter_over_contents(custom):
    print(elem)

and hide the complexity.

While you could define your own class in an attempt to encapsulate this data structure and use something like the iter_over_contents() generator function as its __iter__() method, that approach would likely be slower and wouldn't allow expressions using two levels of indexing such this following:

custom[1]['b']

which using nested dictionaries (or OrderedDefaultdicts as shown in my other answer) would.

martineau
  • 119,623
  • 25
  • 170
  • 301
2

Here's another alternative that uses an OrderedDefaultdict to define the tree-like data structure you want. I'm reusing the definition of it from another answer of mine.

To make use of it, you have to ensure the entries are defined in the order you want to access them in later on.

class OrderedDefaultdict(OrderedDict):
    def __init__(self, *args, **kwargs):
        if not args:
            self.default_factory = None
        else:
            if not (args[0] is None or callable(args[0])):
                raise TypeError('first argument must be callable or None')
            self.default_factory = args[0]
            args = args[1:]
        super(OrderedDefaultdict, self).__init__(*args, **kwargs)

    def __missing__ (self, key):
        if self.default_factory is None:
            raise KeyError(key)
        self[key] = default = self.default_factory()
        return default

    def __reduce__(self):  # optional, for pickle support
        args = (self.default_factory,) if self.default_factory else ()
        return self.__class__, args, None, None, self.iteritems()

Tree = lambda: OrderedDefaultdict(Tree)

custom = Tree()
custom[1]['a'] = np.zeros(10)
custom[1]['b'] = np.zeros(100)
custom[2]['c'] = np.zeros(20)
custom[2]['d'] = np.zeros(200)

I'm not sure I understand your follow-on question. If the data structure is limited to two levels, you could use nested for loops to iterate over its elements in the order they were defined. For example:

for key1, subtree in custom.items():
    for key2, elem in subtree.items():
        print('custom[{!r}][{!r}]: {}'.format(key1, key2, elem))

(In Python 2 you'd want to use iteritems() instead of items().)

Community
  • 1
  • 1
martineau
  • 119,623
  • 25
  • 170
  • 301
  • Thanks for your solution. If you had to provide a more friendly user interface for iterating through the elements in your dictionary in order, is it possible? Say if I want to do `for el in custom:` and then going individually through each element in an ordered manner, how would you do this? – aaragon Dec 15 '15 at 18:05
  • What I meant is that it would be nice to traverse the entire data structure in order by using only one loop. I'm trying to do this by overriding the `__iter__` and `__next__` methods but I'm failing miserably. I also wanted to ask you if you could explain a bit the code you wrote because that's pretty advanced Python for me. – aaragon Dec 19 '15 at 10:55
  • For dictionaries, the `__iter__()` and `next()` (not `__next__`) methods just go over the dictionary's keys, however a recursive tree structure based on dictionaries could support more than one form of iteration — both breadth-first and depth-first traversal, for example. In addition, dictionaries have an `items()` method which returns copies of all the `(key, value)` pairs it contains. With those differences it's unclear what you're trying to accomplish when you say only that you want to iterate through the elements in order. What order and what do you consider an "element" in this context? – martineau Dec 19 '15 at 18:25
  • The code in my answer is an example of implementing [_autovivification_](https://en.wikipedia.org/wiki/Autovivification) in Python and was derived from the Python code shown in the linked Wikipedia article (which contains some additional references). – martineau Dec 19 '15 at 18:25
  • What I would like to have is the data structure you propose, but then instead of iterating throw the dictionary elements by using two loops as you show at the end of your answer, I would like the user to type `for i in custom:`, and since you're using ordered dictionaries the order is always the same when traversing the dictionary. Do you think this can be done by overriding `__iter__` and `next()`? – aaragon Dec 19 '15 at 18:57
  • You could do it by providing custom `__iter__()` and `next()` methods. However I would suggest you implement them as part of separate higher-level container class that internally uses the `Tree` function shown in my answer. This is because it sounds like you want to iterate over the values in the container, not the keys associated with them as would be the case with ordinary Python dictionaries and their `__iter__` methods — in other words since your data structure really isn't one. – martineau Dec 19 '15 at 20:23
  • Yes, something like that. So you suggest I inherit from your class or that I use aggregation? I want to iterate over the values of the dictionary indeed, but internally I want the values to be ordered with the dictionary. So if I understand correctly, I would create another class, where I have your data structure as a variable and then implement `__iter__` and `next()` methods to do the iteration, is this correct? – aaragon Dec 20 '15 at 10:10
  • I think it would be best to represent it as a _has a_, not an _is a_ relationship — so would try to use object composition (not [aggregation](https://en.wikipedia.org/wiki/Object_composition#Aggregation)) rather than inheritance. However that may be difficult to do because of the multiple levels of indexing. But, yes, you otherwise appear to understand what I suggested. – martineau Dec 20 '15 at 13:00
  • Hi again. What do you think of the updated solution I posted? – aaragon Dec 20 '15 at 16:48
  • If you're concerned about performance, [profile](https://docs.python.org/2/library/profile.html#module-cProfile) your code. One possible issue I see is that it doesn't allow more than one iteration of the data structure to be executing at the same time since it explicitly keeps track of its place using instance attributes as opposed to providing a generator function (one containing a `yield` statement). If you'd like your code reviewed, I suggest you post it on stackexchange's [Code Review](http://codereview.stackexchange.com) website. – martineau Dec 20 '15 at 19:08
  • @aaragon We are happy to improve code at CodeReview, but be sure that it works correctly before posting – Caridorc Dec 20 '15 at 19:11
  • Done! The new question has been posted [here](http://codereview.stackexchange.com/questions/114568/iterating-over-the-values-of-a-list-of-ordered-dictionaries). – aaragon Dec 20 '15 at 19:39
  • I'm still trying to understand the code of the `OrderedDefaultdict` you posted. I'm confused about the `Tree = lambda: OrderedDefaultdict(Tree)` line. What exactly is this doing? – aaragon Dec 21 '15 at 09:02
  • It's the recursive definition of a function that creates a tree-like structure comprised of `OrderedDefaultdict`s. Its `defaultdict` base class constructor is effectively a factory method and requires an argument that's "callable". This is often the name of an existing class, like `list`, which can be called with no arguments and returns an instance. However in this case it's a function...which passes itself as the argument in the factory call. So whenever an attempt is made to access a value using a key that doesn't already exist, a new empty `OrderedDefaultdict(Tree)` is added as its value. – martineau Dec 21 '15 at 11:55
1

Could you just use a list of dictionaries?

custom = [{'a': np.zeros(10), 'b': np.zeros(100)},
          {'c': np.zeros(20), 'd': np.zeros(200)}]

This could work if the outer dictionary is the only one you need in the right order. You could still access the inner dictionaries with custom[0] or custom[1] (careful, indexing now starts at 0).

If not all of the indices are used, you could do the following:

custom = [None] * maxLength   # maximum dict size you expect

custom[1] = {'a': np.zeros(10), 'b': np.zeros(100)}
custom[2] = {'c': np.zeros(20), 'd': np.zeros(200)}
Lisa
  • 3,365
  • 3
  • 19
  • 30
  • I can't use this because 1 may or may not be present, and the same goes for 0, and 2. – aaragon Dec 15 '15 at 16:41
  • Ah ok, so you actually need the keys in the outer `dict` - never mind, sorry for the misunderstanding! – Lisa Dec 15 '15 at 16:42
  • @aaragon I edited the answer to preserve the indices which were the keys of your outer `dict` and set all the unavailable elements to `None`. – Lisa Dec 15 '15 at 16:45
  • Your solution is still not sorted, as looping for example over `custom[1]` can get the elements of `a` and then `b`, and in a different iteration the elements `b` and then `a`. This could be solved by using an `OrderedDict` but then the question still remains: Is there a better way to handle this problem? – aaragon Dec 15 '15 at 16:49
0

You can fix the order of your keys while iterating when you sort them first:

for key in sorted(custom.keys()):
    print(key, custom[key])

If you want to reduce sorted()-calls, you may want to store the keys in an extra list which will then serve as your iteration order:

ordered_keys = sorted(custom.keys())
for key in ordered_keys:
    print(key, custom[key])

You should be ready to go for as many iterations over your data structure, as you need.

jbndlr
  • 4,965
  • 2
  • 21
  • 31
  • I iterate lots of times over this data structure. – aaragon Dec 15 '15 at 16:42
  • The structure is iterated over many times throughout the application and what I want to do is to provide a more user-friendly approach to type `for k,v in custom.items(): for i,r in enumerate(v): # etc.` – aaragon Dec 15 '15 at 16:53
  • Well, this seems like a completely different question. – jbndlr Dec 15 '15 at 16:54
  • It is indeed, take a look [here](http://stackoverflow.com/questions/33976175/python-equivalent-of-c-begin-and-end-for-custom-classes), but I have to think carefully about this, and thus I start by choosing the right data structure. – aaragon Dec 15 '15 at 16:57