As per the Hadoop 3.x release notes, they have introduced Erasure coding to overcome the problems with storage.
Erasure coding is a method for durably storing data with significant space savings compared to replication. Standard encodings like Reed-Solomon (10,4) have a 1.4x space overhead, compared to the 3x overhead of standard HDFS replication.
Since erasure coding imposes additional overhead during reconstruction and performs mostly remote reads, it has traditionally been used for storing colder, less frequently accessed data. Users should consider the network and CPU overheads of erasure coding when deploying this feature.
I am looking for the sample configuration files for the same.
Also, even after setting up the ec policy and enabling it using hdfs ec-enablePolicy
, does the policy work for cold files only or it is by default implemented to store the entire hdfs files?