I have a framework in which I need to read, say 10000 files (each file is approx 10MB), process them and write another 10000 files. In the next step I read these 10000 files, do some processing on all of them and write some more files to disk. This entire process happens several times.
Question: Is there an efficient way of storing these files in contiguous locations to save read/write time? Something like tar. I don't want to compress them a lot, but I prefer speed. If I use tar, is there a way I can index (hash) these 10000 files, so that I can read any particular file in O(1) time?