First off, have a look at this: Storing a millon images in the filesystem. While it isn't about backups, it is a worthwile discussion of the topic at hand.
And yes, large numbers of small files are pesky; They take up inodes, require space for filenames &c. (And it takes time to do backup of all this meta-data). Basically it sounds like you got the serving of the files figured out; if you run it on nginx
, with a varnish
in front or such, you can hardly make it any faster. Adding a database under that will only make things more complicated; also when it comes to backing up. Alas, I would suggest working harder on a in-place FS backup strategy.
First off, have you tried rsync
with the -az
-switches (archive and compression, respectively)? They tend to be highly effective, as it doesn't transfer the same files again and again.
Alternately, my suggestion would be to tar + gz into a number of files. In pseudo-code (and assuming you got them in different sub-folders):
foreach prefix (`ls -1`):
tar -c $prefix | gzip -c -9 | ssh -z destination.example.tld "cat > backup_`date --iso`_$prefix.tar.gz"
end
This will create a number of .tar.gz-files that are easily transferred without too much overhead.