1

I'm trying to setup an Hadoop 3 cluster.

Two questions about the Erasure Coding feature :

  1. How I can ensure that erasure coding is enabled ?
  2. Do I still need to set the replication factor to 3 ?

Please indicate the relevant configuration properties related to erasure coding/replication, in order to get the same data security as Hadoop 2 (replication factor 3) but with the disk space benefits of Hadoop 3 erasure coding (only 50% overhead instead of 200%).

unwelcomed_user
  • 340
  • 3
  • 15
Klun
  • 78
  • 2
  • 25
  • There isn't a single on/off configuration - https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html#Deployment – OneCricketeer Jul 23 '18 at 14:57
  • Hey. Thanks for the full documentation link. However, could you please provide best practice regarding Erasure Coding from people coming from Hadoop 2, and concrete simple examples ? As details, an interesting point for me is when to use and when to not use Erasure Coding. I will then approve your answer ! – Klun Jul 23 '18 at 18:21
  • Well, I have never used it, but it should be obvious that you would use it in order to save space on the filesystem, and save money having to buy extra storage hardware. I would say not to use it because Hadoop2 has been fairly stable for several years and even the major Hadoop vendors aren't all completely supporting Hadoop3 yet – OneCricketeer Jul 23 '18 at 19:17

1 Answers1

5

In Hadoop3 we can enable Erasure coding policy to any folder in HDFS. By default erasure coding is not enabled in Hadoop3, you can enable it by using setPolicy command with specifying desired path of folder.

1: To ensure erasure coding is enabled, you can run getPolicy command.

2: In Hadoop3 Replication factor setting will affect only to other folders which is not configured by erasure code setPolicy. You can use both Erasure coding and replication factor settings in single cluster.

Command to List the supported erasure policies:

./bin/hdfs ec -listPolicies

Command to Enable XOR-2-1-1024k Erasure policy:

./bin/hdfs ec -enablePolicy -policy XOR-2-1-1024k

Command to Set Erasure policy to HDFS directory:

./bin/hdfs ec -setPolicy -path /tmp -policy XOR-2-1-1024k

Command to Get the policy set to the given directory:

./bin/hdfs ec -getPolicy -path /tmp

Command to Remove the policy from the directory.i.e unset policy:

./bin/hdfs ec -unsetPolicy -path /tmp

Command to Disable policy:

./bin/hdfs ec -disablePolicy -policy XOR-2-1-1024k

unwelcomed_user
  • 340
  • 3
  • 15