I have to read many (up to 5 mio.) small (9 KB) files. At the moment they are all in one directory. I fear this will take quadratic time or even n^2 log n for look up, is this right? Is this significant (will the lookup take more time than the actual reading)? Is there a difference in the asymptotic behavior of the running time when the file are cached by the OS?
I use C++-streams for reading the files. At the moment I'm using Windows 7 with NTFS, but I will later run the program on a linux cluster (not sure which file system).