Writing file to HDFS using Scala

Question

I am trying to write a file into HDFS using scala File System Api getting following error on client as well as same on hadoop logs :

File /user/testuser/test.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

testuser has permission to read, write and execute . I checked the hdfs on ambari its up and running , not sure why getting this error

After doing google for error I have already tried stopping all service , formatting the namenode and starting all service etc , like it says on below link

Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)

I still have same error . Any suggestion what I am doing wrong , I am new to hadoop so any suggestions will be appreciated .

following is the scala code I m using

def write(uri: String, filePath: String, data: Array[Byte]) = {
        System.setProperty("HADOOP_USER_NAME", "usernamehere")
        val path = new Path(filePath)
        val conf = new Configuration()
        conf.set("fs.defaultFS", uri)
        conf.set("dfs.client.use.datanode.hostname", "true");
        conf.addResource(new Path("/path/core-site.xml"));
        conf.addResource(new Path("/path/hdfs-site.xml"));
        val fs = FileSystem.get(conf)
         val os = fs.create(path)
        fs.setPermission(path,FsPermission.getDefault)
        val out = new BufferedOutputStream(os)
        println(data.length)
        out.write(data)
        out.flush()
        out.close()
        fs.close()

      }

Thanks

You need to disable replication on your file or add another datanode — OneCricketeer, Mar 19 '18 at 13:30

Praveen L · Answer 1 · 2018-03-19T10:41:53.037

0

For writing any file to HDFS, you need to use hdfs commands like copyFromLocal only.

Assuming you are using Spark with scala, then you need to use Spark file writing commands like.

some_dataframe.write.mode(SaveMode.Overwrite).parquet("c:\\MyWorkSpace\\Spark\\")

Above commands are understood by HDFS to replicate the data as per the replication factor. But, if you use scala File system api in HDFS, it causes issue, because scala cannot understood the hdfs features like replication, data blocks, partitions.

edited Mar 19 '18 at 10:41

answered Mar 19 '18 at 10:34

Praveen L

937
6
13

Scala doesn't need to understand replication of block placement... The FileSystem API is no different than calling `hdfs put` from the command line – OneCricketeer Mar 19 '18 at 13:29

Writing file to HDFS using Scala

1 Answers1