2

I am trying to write a file into HDFS using scala File System Api getting following error on client as well as same on hadoop logs :

File /user/testuser/test.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

testuser has permission to read, write and execute . I checked the hdfs on ambari its up and running , not sure why getting this error

ambari looks like

After doing google for error I have already tried stopping all service , formatting the namenode and starting all service etc , like it says on below link

Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)

I still have same error . Any suggestion what I am doing wrong , I am new to hadoop so any suggestions will be appreciated .

following is the scala code I m using

def write(uri: String, filePath: String, data: Array[Byte]) = {
        System.setProperty("HADOOP_USER_NAME", "usernamehere")
        val path = new Path(filePath)
        val conf = new Configuration()
        conf.set("fs.defaultFS", uri)
        conf.set("dfs.client.use.datanode.hostname", "true");
        conf.addResource(new Path("/path/core-site.xml"));
        conf.addResource(new Path("/path/hdfs-site.xml"));
        val fs = FileSystem.get(conf)
         val os = fs.create(path)
        fs.setPermission(path,FsPermission.getDefault)
        val out = new BufferedOutputStream(os)
        println(data.length)
        out.write(data)
        out.flush()
        out.close()
        fs.close()

      }

Thanks

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
Ruby
  • 873
  • 5
  • 20
  • 47

1 Answers1

0

For writing any file to HDFS, you need to use hdfs commands like copyFromLocal only.

Assuming you are using Spark with scala, then you need to use Spark file writing commands like.

some_dataframe.write.mode(SaveMode.Overwrite).parquet("c:\\MyWorkSpace\\Spark\\")

Above commands are understood by HDFS to replicate the data as per the replication factor. But, if you use scala File system api in HDFS, it causes issue, because scala cannot understood the hdfs features like replication, data blocks, partitions.

Praveen L
  • 937
  • 6
  • 13
  • Scala doesn't need to understand replication of block placement... The FileSystem API is no different than calling `hdfs put` from the command line – OneCricketeer Mar 19 '18 at 13:29