Creating a huge lookup table, issues on performance

Question

I would like to create a lookup table, and for that I am thinking of using a dictionary. The dictionary would have keys corresponding to an integer (or in my case to an enumerated type from the class Enum), and the values would be 2, 3 or 4 numpy arrays. But somehow I'm reluctant to use this approach as this dictionary has a huge amount of information, and 99% of it may not be used at all for some problems. So then it doesn't make sense to build a single object containing all lookup information, and even though I'm speculating, I'm almost certain that there's a better approach to accomplish what I want to do.

So coming from a C++ world, I would create a unordered_map of enum types to function pointers, where in the function I would create a static array (so that it would be created only once) and then I would return the array pointer. In this way I would only instantiate the part of the lookup table that's really needed for the program instead of the whole thing.

But I'm trying to do something similar in Python, so I would like to know what's the most efficient way to accomplish this.

EDIT

So this is what I came up with so far. I kind of mixed the suggestions made by @AaronDigulla and @DanielRoseman, even though the @runonce may not be necessary anymore. A subclass of dict overrides the __getitem__ method and checks if a key exists in the dictionary. If it doesn't, it calls a function (using eval() on a concatenated string from the dictionary key values). I would appreciate any improvements to the given code. It looks quite complex but it works, so I'm wondering if it can be simplified further.

import collections, types
import numpy as np

Quadrature = collections.namedtuple("Quadrature", "wgt xi eta zeta")

category_map = { "t3" : "tri" #... more types
               }


class Integrator(dict):

  def __init__(self, *args, **kwargs):
    self.update(*args, **kwargs)

  def __getitem__(self, key):

    if not key in self:

      fn = '{}_{}gp'.format(category_map[key[0]], str(key[1]))
      super().__setitem__(key, eval(fn)())

    val = super().__getitem__(key)
    return val

  def __repr__(self):
    dictrepr = dict.__repr__(self)
    return '%s(%s)' % (type(self).__name__, dictrepr)

  def update(self, *args, **kwargs):
    print ('update', args, kwargs)
    for k, v in dict(*args, **kwargs).items():
        self[k] = v

def run_once(f):
  def wrapper(*args, **kwargs):
    if not wrapper.has_run:
      wrapper.has_run = True
      return f(*args, **kwargs)
  wrapper.has_run = False
  return wrapper


@run_once
def tri_3gp():
  xi   = np.array([2/3., 1/6., 1/6.])
  eta  = np.array([1/6., 1/6., 2/3.])
  wgt  = np.array([2/3., 2/3., 2/3.]);
  return Quadrature(wgt, xi, eta, None)

Try pout dictionaries, they work as hashmaps - and have some good performance. Are they actually too slow with your dataset? (have you tested it?). You could even override it to "cache" some results if you only need a few - and thus increasing lookup speed. — Etse, Sep 05 '14 at 07:42
Does huge mean that it still fits into memory? And where do your values come from? Do you already know memcache/momorise as decorators? maybe have a look here for example: http://blog.isotoma.com/2009/09/of-python-memcached-and-decorators-easy-peasy-function-caching/ — Argeman, Sep 05 '14 at 07:43
[This question](http://stackoverflow.com/questions/1988804/what-is-memoization-and-how-can-i-use-it-in-python) will probably be useful to you, as well as [this implementation as a function decorator](https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize). — Carsten, Sep 05 '14 at 07:44
@Argeman it would still fit in memory, but it would definitely be a waste, as most of it won't be used by the final program. — aaragon, Sep 05 '14 at 08:10
I would say, not using the memory is a waste. So unless you need the memory for something else it should be fine. — Etse, Sep 05 '14 at 11:11

score 1 · Answer 1 · answered Sep 05 '14 at 07:44

You can do exactly the same thing in Python. In fact, it's even easier, as functions are first-class objects themselves: you can store them in a dictionary and call them as necessary.

To replace the static array, you could use some kind of memoizing, such as a standard global lookup array.

global_dict = {}

def func1():
    if 'param1' not in global_dict:
        global_dict['param1'] = my_complicated_function_for_param_1()
    return global_dict['param1'] = my_complicated_function_for_param_1()



lookup_dict = {
    'param1': func1,
    ...
}

# now do the lookup:
my_result = lookup_dict[my_param]()

You may of course want to factor out the logic from the calculation functions: a decorator could be a good approach.

score 1 · Answer 2 · edited May 23 '17 at 12:21

1

That's pretty simply to do in Python. See this question how to create "run once" decorators: Efficient way of having a function only execute once in a loop

Now you can put the functions to create the data in the map. The decorator will make sure they are run at most once (first time you call them). Then the loopup would be like this:

@run_once
def funcForKey():
    ...

lookup_dict = {
    'key': funcForKey,
    ...
}

result = lookup_dict[x]()

The () after the [] calls the function.

You also may want to try a class:

class Data(object):
    @run_once
    def key(self):
        ...

data = Data()

now you can look up values like so:

a = 'key'
result = getattr(data, a)()

or, if the name is constant at runtime, simply:

result = data.key()

edited May 23 '17 at 12:21

Community

1
1

answered Sep 05 '14 at 07:48

Aaron Digulla

321,842
108
597
820

So what you're implying is that I could use this type of decorator to populate the map on demand right? – aaragon Sep 05 '14 at 08:12
Yes. The decorator does the caching for you. – Aaron Digulla Sep 05 '14 at 09:07

Creating a huge lookup table, issues on performance

2 Answers2