In hadoop 2.0 the default replication factor is 3. And the number of node failures acceptable was 3-1=2.
So on a 100 node cluster if a file was divided in to say 10 parts (blocks), with replication factor of 3 the total storage blocks required are 30. And if any 3 nodes containing a block X and it's replicas failed then the file is not recoverable. Even if the cluster had 1000 nodes or the file was split in to 20 parts, failure of 3 nodes on the cluster can still be disastrous for the file.
Now stepping into hadoop 3.0.
With erasure coding, as Hadoop says it provides the same durability with 50% efficient storage. And based on how Reed-Solomon method works (that is for k data blocks and n parity blocks, at least k of the (k+n) blocks should be accessible for the file to be recoverable/readable)
So for the same file above - there are 10 data blocks and to keep the data efficiency to 50%, 5 parity blocks can be added. So from the 10+5 blocks, at least any 10 blocks should be available for the file to be accessible. And on the 100 node cluster if each of the 15 blocks are stored on a separate node, then as you can see, a total of 5 node failures is acceptable. Now storing the same file (ie 15 blocks) on a 1000 node cluster would not make any difference w.r.t the number of acceptable node failures - it's still 5.
But the interesting part here is - if the same file (or another file) was divided into 20 blocks and then 10 parity block were added, then for the total of 30 blocks to be saved on the 100 node cluster, the acceptable number of node failures is 10.
The point I want to make here is -
in hadoop 2 the number of acceptable node failures is ReplicationFactor-1 and is clearly based on the replication factor. And this is a cluster wide property.
but in hadoop 3, say if the storage efficiency was fixed to 50%, then the number of acceptable node failures seems to be different for different files based on the number of blocks it is divided in to.
So can anyone comment if the above inference is correct? And how any clusters acceptable node failures is determined?
(And I did not want to make it complex above, so did not discuss the edge case of a file with one block only. But the algorithm, I guess, will be smart enough to replicate it as is or with parity data so that the data durability settings are guaranteed.)
Edit:
This question is part of a series of questions I have on EC - Others as below -
Hadoop 3.0 erasure coding: impact on MR jobs performance?