I'm doing some javascript 3D processing, and i have a very large amount of objects (say object A), each one containing some stuff and an array of positive integers, such as [0, 1, 4], [1, 5, 74, 1013], etc. They don't need to have a private value, all objects can share the same list. Thoses numbers can go from 0 to a few thousands, say 65k (a short).
Profiling revealed that thoses arrays are eating a lot of memory. When computing, my program reach more than 2GB of allocated mem, this is not some case of stupid pre-optimisation.
I have 2 leads for memory optimisation:
- find a more memory-effective way to store thoses lists (Maybe array of bits in big numbers?)
- Find some way to avoid duplicates. For instance, I happened to find that some arrays (like [0,1,2,3,4,5,6]) was present in more than 40 000 objects A. Maybe storing thoses arrays in a tree structure and making my objects A point to it would help?
Do you have a suggestion?
EDIT: I forgot to mention it but it's important: each integer in the list is UNIQUE. EDIT2: The only important thing to retrieve is the SET of integers, the order isn't important.
I'm thinking of converting thoses arrays to "Big Integers" with bitwise operations i.e. create a Big Integer with some class, set the bits 1, 5, 74, 1013, the convert the big int to a string (array of 8 bytes) and store the string, but it won't always be a gain (for instance, the array [4000] will be represented as a 500 byte-long string...)
Scope of the project (useless, but i've been asked for it)
This is supposed to be useless to answer the question, but i've been asked for it several times, i put it here.
I'm building a 3D mesh of volumic objects, to simplify, let's just assume i have a lot of spheres. I know their position (center, ray) and i want to draw them in a single 3D Mesh. For that, i have a memory structure called an Octree that allow me to divide the 3D space in lower cells (the octree nodes) around the surface of my object. I can then build a mesh from thoses cells.
Thoses cells are what i called object A in the description. Each cell contains a list of ids (positive integers) which points to the Sphere objects the cell intersects.
Fact is that profiling showed that thoses cells arrays are retaining several hundred of MB in memory. I would like to reduce that number, by finding a way to remove all duplicates and/or if possible, finding a more effective way to store a list of positive ids (that can go from 0 to 65k).