The problem is not necessarily that a file system can't handle millions of files. They can.
The problem is that few of the tools typically available to manipulate the files do not scale well to millions of files.
Consider both ls
and rm
.
By default ls
sorts its filenames. If you do a simple ls
on a huge directory, it basically becomes unresponsive while scanning and sorting all of those millions of files. You can run ls
and tell it not to sort, it works, but it's still slow.
rm
suffers simply from the problem of filename expansion. Modern shells have very high base resource availability, but you don't want to run shell expansion (e.g. "123*") on millions of files. You need to jump through working with things like find
and xargs
, but it's actually even better to write custom code.
And heaven forbid you accidentally hit TAB in an autocompleting shell while in a directory with millions of entries.
The database does not suffer these issues. A table scan of millions of records is routine for the database. Operations on millions of anything takes time, but the DB is much better suited for it, especially small things like session entries (assuming your sessions are, indeed, small -- most tend to be).
The JDBCStore deftly routes around the file system problems and puts the load on a data store more adept to handling these kinds of volumes. File systems are key can make good, ad hoc "key-value" stores, but most of our actual work with file systems tend to be scanning of values. And those tools don't work very well with large volumes.
Addenda after looking at the code.
It's easy to see why a large file store will crush the server.
Simply, with the FileStore, every time it wants to try and expire sessions, it reads in the entirety of the directory.
So, best case, imaging reading in a 50M file directory once per minute. This is not practical.
Not only does it read the entire directory, it then proceeds to read every single file within the directory to see if it's expired. This is also not practical. 50M files, utilizing a simple, say, 1024 byte buffer to just read the header of the file, that's 50G of data processing...every minute.
And that's on the optimistic assumption that it only checks once per minute, and not more often.
In contrast, within the JDBCStore, the expiration time is a first class element of the model, so it simply returns all rows with a date less than the expiration time. With an index on that field, that query is essentially instantaneous. Even better, when the logic goes to check if the session has, indeed, expired, it's only checking those that meet the base criteria of the date, instead of every single session.
This is what's killing your system.
Now.
Could a FileStore be made to work better? I don't think so. There's no easy way to match wildcards (that I know of) IN the file system. Rather, all of that matching and such is done against a simple "table scan" of the files. So, even though you'd think it would be easy to simply, say, append the expiration time to the end of file name, you can't find that file (i.e. "Find file with filename that starts with "SESSIONID") without scanning all of them.
If the session meta data were all stored in RAM, then you can index it however you want. But you're in for an ugly start up time when the container starts as it reloads all of the lingering sessions.
So, yea, I think at scale, the JDBCStore (or some other database/indexed solution) is the only real practical way to do things.
Or, you could use the database simply for the meta-data with the file storing the actual session information. Still need a database, but if you're uncomfortable storing your session BLOBs in the DB, that's an alternative.
Perhaps there are some filesystem specific utilities that can better leverage the actual file system architecture that you could fork and then read the results of (or use JNI to talk to the FS directly), but obviously that would be quite file system dependent. I'm not that intimate with the underlying capabilities of the different file systems.