I am creating a Trie in memory. Each node contains is a word. It is extremely good performance-wise. But the catch is the memory consumption.
It is 6GB big! I serialized it with protobuf and wrote it to a file that came out to be 150MB.
JSON is 250MB. I was hoping if there is a way to minify the strings? For eg:
As you can see there are duplicates in the first column. Also, it should be reversible.
All the properties/columns are string.
So let's say the table gets converted to :
I think that would save a lot of space. Of course I can do this by inserting each cell in a dictionary first and then assigning it an integer but I do not want to reinvent the wheel unless I have to.