0

I'm keeping a record of what I'm calling profiles.

Each profile is a tuple of dictionaries: (Dict,Dict), and we can associate to it a unique id(may include characters like _ or -).

I need to keep them in RAM memory for a certain time, because I'll need to search and updated some of them, and only at the end, when I no longer need the whole set of profiles will I flush them to persistent storage.

Currently, I'm using a dictionary/hash table to keep all of them (number of elements around 100K, but could be more), since I'll do many searches, using the id:profile as the key:value pair.

The data looks similar to this:

{
"Aw23_1adGF":({
     "data":
        {"contacts":[11,22],"address":"usa"},
     "meta_data":
        {"created_by":"John"}
    },{"key_3":"yes"}),
"AK23_1adGF":({
     "data":
        {"contacts":[33,44],"address":"mexico"},
     "meta_data":
        {"created_by":"Juan"}
    },{"key_3":"no"}),
# ...
}

Once this data structure is built, I don't need to add/delete any more elements. I only build it once, than search it many times. On certain occasions I'll need to update some element in the dictionaries that compose a profile. However, building this data object contributes to the peak RAM usage that I'm trying to diminish.

The problem is that the dictionary seems to use too much RAM.

What were my other options regarding data structures that could keep some of the search efficiency and with a small RAM footprint?

I thought of an ordered list, since the id seems to me to be orderable (maybe except for characters like _ or -).

What data structures are there that could help me in this predicament?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
An old man in the sea.
  • 1,169
  • 1
  • 13
  • 30
  • 3
    Maybe edit the question to include some code that demonstrates the problem. E.g. use the random module to create dummy data in a relevant manner. Your description is somewhat vague to me – Sam Mason Jan 08 '23 at 10:24
  • 3
    A dictionary is a very thin structure containing mostly pointers and hashes (i.e. different kinds of primitive numbers). If you observe significant memory usage, that's most likely your *data* taking up the memory, not the data *structure* around it. – MisterMiyagi Jan 08 '23 at 10:32
  • It looks like many keys are identical strings. Did you deduplicate these? – MisterMiyagi Jan 08 '23 at 10:46
  • @MisterMiyagi - first comment The idea I have of dict in python is that it takes a considerable amount of extra space, to avoid hashing collisions, and every once in a while, will have to be resized to improve collision probabilities. – An old man in the sea. Jan 08 '23 at 10:59
  • @Anoldmaninthesea. A `dict` does, but it contains only pointers/hashes. So the extra space is on the order of pointers, not data. You'll have some factor more *pointer* space not more *data* space. A 100k item `dict` vs `list` has a size of about 5MB vs 1 MB. – MisterMiyagi Jan 08 '23 at 11:00
  • @MisterMiyagi second comment - Not sure what you mean... I redited the mock example with more 'real' data. The id keys are different in both entries. – An old man in the sea. Jan 08 '23 at 11:00
  • @Anoldmaninthesea. The data keys all are identical: `data`, `contacts`, `address`, `meta_data`, etc. If you load these naively then each of them may be a separate string object. Just having the string `"data"` 100k times takes up as much memory as the outermost `dict`. – MisterMiyagi Jan 08 '23 at 11:04
  • @MisterMiyagi hum... and then I could apply something like a dawg or trie for strings? – An old man in the sea. Jan 08 '23 at 11:06
  • 1
    @Anoldmaninthesea. You would deduplicate them by keeping a lookup table (`dict`) to each "canonical" instance (the lookup happens by equality but the result is by identity). There's also [`sys.intern`](https://docs.python.org/3.12/library/sys.html#sys.intern). – MisterMiyagi Jan 08 '23 at 11:10
  • Thanks for the suggestions @MisterMiyagi I'll read more about them. – An old man in the sea. Jan 08 '23 at 11:22
  • You might be better off using a relational database to track this information. Something like Postgres is designed for safely storing this information and allowing transactional updates to records – Sam Mason Jan 08 '23 at 22:58
  • Maybe you should use something like redis (in memory cache) – Abdul Aziz Barkat Jan 09 '23 at 17:52
  • 1
    Are you talking about server RAM or are these being passed to a client (web browser)? – Travis J Jan 09 '23 at 18:45
  • How much RAM is it currently using, and how little RAM are you hoping for? – Kelly Bundy Jan 22 '23 at 06:25

0 Answers0