Unable to load large file to HDFS on Spark cluster master node

Question

I have fired up a Spark Cluster on Amazon EC2 containing 1 master node and 2 servant nodes that have 2.7gb of memory each

However when I tried to put a file of 3 gb on to the HDFS through the code below

/root/ephemeral-hdfs/bin/hadoop fs -put /root/spark/2GB.bin 2GB.bin

it returns the error, "/user/root/2GB.bin could only be replicated to 0 nodes, instead of 1". fyi, I am able to upload files of smaller size but not when it exceeds a certain size (about 2.2 gb).

If the file exceeds the memory size of a node, wouldn't it will be split by Hadoop to the other node?

what do you mean by "have 2.7gb of memory each"? do you refer to RAM or hard-disk? — Yaron, Apr 03 '16 at 10:46

score 0 · Answer 1 · edited May 23 '17 at 11:59

0

Edit: Summary of my understanding of the issue you are facing:

1) Total HDFS free size is 5.32 GB

2) HDFS free size on each node is 2.6GB

Note: You have bad blocks (4 Blocks with corrupt replicas)

The following Q&A mentions similar issues: Hadoop put command throws - could only be replicated to 0 nodes, instead of 1

In that case, running JPS showed that the datanode are down.

Those Q&A suggest a way to restart the data-node:

What is best way to start and stop hadoop ecosystem, with command line? Hadoop - Restart datanode and tasktracker

Please try to restart your data-node, and let us know if it solved the problem.

When using HDFS - you have one shared file system

i.e. all nodes share the same file system

From your description - the current free space on the HDFS is about 2.2GB , while you tries to put there 3GB.

Execute the following command to get the HDFS free size:

hdfs dfs -df -h

hdfs dfsadmin -report

or (for older versions of HDFS)

hadoop fs -df -h

hadoop dfsadmin -report

edited May 23 '17 at 11:59

Community

1
1

answered Apr 03 '16 at 10:48

Yaron

10,166
9
45
65

when it says "DFS Remaining: 5713575936 (5.32 GB)" on the Master Node and on the 2 data nodes it says "DFS Remaining: 2856787968(2.66 GB)" are they referring to diskspace or RAM? – Stanley Apr 03 '16 at 12:39
can you please provide the command you executed and its output? – Yaron Apr 03 '16 at 12:55
command execute: ./hadoop dfsadmin -report, output: Configured Capacity: 8443527168 (7.86 GB) Present Capacity: 5713715200 (5.32 GB) DFS Remaining: 5713575936 (5.32 GB) DFS Used: 139264 (136 KB) DFS Used%: 0% Under replicated blocks: 4 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 2 (2 total, 0 dead) Decommission Status : Normal Configured Capacity: 4221763584 (3.93 GB) DFS Used: 69632 (68 KB) Non DFS Used: 1364905984 (1.27 GB) DFS Remaining: 2856787968(2.66 GB) DFS Used%: 0% DFS Remaining%: 67.67% – Stanley Apr 03 '16 at 12:59

Unable to load large file to HDFS on Spark cluster master node

1 Answers1