Suppose you have a lot of source code (like 50GB+) in popular languages (Java, C, C++, etc).
The project needs are:
compressing source code to reduce disk use and disk I/O
indexing it in such way that particular source file can be extracted from the compressed source without decompressing the whole thing
compression time for the whole codebase is not important
search and retrieval time (and memory use when searching and retrieving) are important
This SO answer contains potential answers: What are the lesser known but useful data structures?
However, this is just a list of potentials - I do not know how those structures actually evaluate against requirements listed above.
Question: what are data structures (and their implementations) that would perform well according to the aforementioned requirements?