In my application, I need a fast look up of attributes. Attributes are in this case a composition of a string and a list of dictionaries. These attributes are stored in a wrapper class. Let's call this wrapper class Plane
:
class Plane(object):
def __init__(self, name, properties):
self.name = name
self.properties = properties
@classmethod
def from_idx(cls, idx):
if idx == 0:
return cls("PaperPlane", [{"canFly": True}, {"isWaterProof": False}])
if idx == 1:
return cls("AirbusA380", [{"canFly": True}, {"isWaterProof": True}, {"hasPassengers": True}])
To better play with this class, I added a simple classmethod to construct instances by providing and integer.
So now in my application I have many Planes, of the order of 10,000,000. Each of these planes can be accessed by a universal unique id (uuid). What I need is a fast lookup: given an uuid, what is the Plane. The natural solution is a dict. A simple class to generate planes with uuids in a dict and to store this dict in a file may look like this:
class PlaneLookup(object):
def __init__(self):
self.plane_dict = {}
def generate(self, n_planes):
for i in range(n_planes):
plane_id = uuid.uuid4().hex
self.plane_dict[plane_id] = Plane.from_idx(np.random.randint(0, 2))
def save(self, filename):
with gzip.open(filename, 'wb') as f:
pickle.dump(self.plane_dict, f, pickle.HIGHEST_PROTOCOL)
@classmethod
def from_disk(cls, filename):
pl = cls()
with gzip.open(filename, 'rb') as f:
pl.plane_dict = pickle.load(f)
return pl
So now what happens is that if I generate some planes?
pl = PlaneLookup()
pl.generate(1000000)
What happens is, that lots of memory gets consumed! If I check the size of my pl
object with the getsize() method from this question, I get on my 64bit machine a value of 1,087,286,831 bytes. Looking at htop, my memory demand seems to be even higher (around 2GB).
In this question, it is explained quite well, why python dictionaries need much memory.
However, I think this does not have to be the case in my application. The plane object that is created in the PlaneLookup.generate() method contains very often the same attributes (i.e. the same name and the same properties). So it has to be possible, to save this object once in the dict and whenever the same object (same name, same attribute) is created again, only a reference to the already existing dict entry is stored. As a simple Plane object has a size of 1147 bytes (according to the getsize()
method), just saving references may save a lot of memory!
The question is now: How do I do this? In the end I need a function that takes a uuid as an input and returns the corresponding Plane
object as fast as possible with as little memory as possible.
Maybe lru_cache can help?
Here is again the full code to play with: https://pastebin.com/iTZyQQAU