Does changing the value of dfs.blocksizeaffect existing data

Question

My Hadoop version is 2.5.2. I am changing my dfs.blocksize in hdfs-site.xml file on the master node. I have the following question:

1) Will this change affect the existing data in HDFS 2) Do I need to propogate this change to all he nodes in Hadoop cluster or only on the NameNode is sufficient

score 3 · Answer 1 · answered Feb 18 '15 at 17:38

1) Will this change affect the existing data in HDFS

No, it will not. It will keep the old block size on the old files. In order for it to take the new block change, you need to rewrite the data. You can either do a hadoop fs -cp or a distcp on your data. The new copy will have the new block size and you can delete your old data.

2) Do I need to propogate this change to all he nodes in Hadoop cluster or only on the NameNode is sufficient?

I believe in this case you only need to change the NameNode. However, this is a very very bad idea. You need to keep all of your configuration files in sync for a number of good reasons. When you get more serious about your Hadoop deployment, you should probably start using something like Puppet or Chef to manage your configs.

Also, note that whenever you change a configuration, you need to restart the NameNode and DataNodes in order for them to change their behavior.

Interesting note: you can set the blocksize of individual files as you write them to overwrite the default block size. E.g., hadoop fs -D fs.local.block.size=134217728 -put a b

score 1 · Answer 2 · answered Feb 18 '15 at 16:36

1

ochanging the block size in hdfs-site.xml will only affect the new data.

answered Feb 18 '15 at 16:36

Bhuvan

104
9

Do I need to change the block size on all the ndoes or only at masternode? – Tariq Feb 18 '15 at 16:55

score 1 · Accepted Answer · answered Feb 18 '15 at 17:04

1

you should be making changes in hdfs-site.xml of all slaves also... dfs.block size should be consistent accross all datanodes.

answered Feb 18 '15 at 17:04

Bhuvan

104
9

Thanks. What will be preferred way of changing at all the nodes [number of node 30+ in the cluster] – Tariq Feb 18 '15 at 17:06
which distribution you are using... by seeing your questions it looks like you are using apache distribution..easiest wayr i i can find is write a shell script to first delete hdfs-site.xml in slaves like below – Bhuvan Feb 18 '15 at 17:11

score 1 · Answer 4 · answered Feb 18 '15 at 17:15

which distribution you are using... by seeing your questions it looks like you are using apache distribution..easiest way i can find is write a shell script to first delete hdfs-site.xml in slaves like

ssh username@domain.com 'rm /some/hadoop/conf/hdfs-site.xml'
ssh username@domain2.com 'rm /some/hadoop/conf/hdfs-site.xml'
ssh username@domain3.com 'rm /some/hadoop/conf/hdfs-site.xml'

later copy the hdfs-site.xml from master to all the slaves

scp /hadoop/conf/hdfs-site.xml username@domain.com:/hadoop/conf/ 
scp /hadoop/conf/hdfs-site.xml username@domain2.com:/hadoop/conf/ 
scp /hadoop/conf/hdfs-site.xml username@domain3.com:/hadoop/conf/

Apache Hadoop 2.5.2. Could you please answer this question as well: http://stackoverflow.com/questions/28586561/yarn-container-lauch-failed-exception-and-mapred-site-xml-configuration — Tariq, Feb 18 '15 at 17:19

Does changing the value of dfs.blocksizeaffect existing data

4 Answers4

Linked