I have lots of data which I would like to save to disk in binary form and I would like to get as close to having ACID properties as possible. Since I have lots of data and cannot keep it all in memory, I understand I have two basic approaches:
- Have lots of small files (e.g. write to disk every minute or so) - in case of a crash I lose only the last file. Performance will be worse, however.
- Have a large file (e.g. open, modify, close) - best sequential read performance afterwards, but in case of a crash I can end up with a corrupted file.
So my question is specifically:
If I choose to go for the large file option and open it as a memory mapped file (or using Stream.Position
and Stream.Write
), and there is a loss of power, are there any guarantees to what could possibly happen with the file?
Is it possible to lose the entire large file, or just end up with the data corrupted in the middle?
Does NTFS ensure that a block of certain size (4k?) always gets written entirely?
Is the outcome better/worse on Unix/ext4?
I would like to avoid using NTFS TxF since Microsoft already mentioned it's planning to retire it. I am using C# but the language probably doesn't matter.
(additional clarification)
It seems that there should be a certain guarantee, because -- unless I am wrong -- if it was possible to lose the entire file (or suffer really weird corruption) while writing to it, then no existing DB would be ACID, unless they 1) use TxF or 2) make a copy of the entire file before writing? I don't think journal will help you if you lose parts of the file you didn't even plan to touch.