A Windows desktop application, whose development team I'm part of uses the legacy MFC CArchive as its application file format to serialise both text files and binary files to and from disk. The app is used to localise strings contained in these text/binary files, and the CArchive encapsulates a translation "project," so it is generated as one monolithic file containing one or more of these sub-files.
This file format is showing its age in many ways and we are looking to change to something more modern. Our main concerns are that it is slow and takes up huge amounts of memory; it's not random access, so accessing an arbitrary file within the archive or even just generating a directory listing requires loading the entire thing into memory, so the space and time consumption of manipulating the archive is dependant on its size, and it's not feasible to do in-place updates of the archive either.
Finally, extending the format is painful, as it involves us littering our code with conditional statements that serialise certain fields (or not) to or from the archive depending on the value of the archive's version stamp.
I've spent some time looking around at the alternatives, and the ones that stand out are ZIP/7Z or SQLite, as ZIP already has most of the file management/indexing functionality built in, while SQLite would be ideal for the storage, retrieval and searching of strings, so I'm thinking some combination of the two technologies might be the way to go.
As far I can see, the trick would be organising or partitioning the SQLite DB in such a way that as it grows, it does not slow down, and searches can be restricted to individual files, either by creating one table per file or one DB per file, I'm not sure.
Has anyone else tried such a thing and if so, any advice?
Thanks