Background
I'm working on a python project (version 2.7) that generates a large amount of data over a long span of time (i.e. days to weeks). The amount of data generated over a week can be anywhere from 50MB
to 2GB
of data, roughly. The embedded system that runs this project has a decent amount of RAM (i.e. 64GB
), but no persistent storage (data is offloaded over the network at random intervals), and I'd like to keep the amount of memory used by this project to minimum, while holding onto the in-memory data as long as possible.
Problem
This program is holding onto a massive in-memory list of string data, and I need to get the memory usage down. This data is not ready frequently (i.e. once per day), but is updated frequently (i.e. 1MB of data added at random intervals). I'm considering compressing it in-memory, since it's just a bunch of human-readable strings of ASCII-formatted text, and could hold onto data for a longer period of time before having to prune/delete the "oldest" entries.
Question
Is there a way, using standard/built-in Python functionality (i.e. no third party modules permitted), to:
- Create an in-memory, compressed list (to reduce the in-memory size of this massive list-of-strings).
- Allow for individual list entries to be extracted on-demand from the list.
- Allow for deleting individual entries (i.e. the "oldest" entries if the compressed in-memory list grows too large).
Work So Far
- Compression: I've found examples on using
zlib
andpickle
to store a compressed copy of data in-memory, but this doesn't appear to allow me to index into the compressed data to extract individual entries without first de-compressing the entire object. - Deletion and extraction: Same issue as above, where it's an all-or-nothing approach.
- I could also create a list of compressed entries, but I'd prefer to compress the entire list rather than one entry at a time, to improve the compression ratio (i.e. take advantage of solid compression, since the same data is repeated often from one entry to another).