0

I am new to working with HDFS. I am trying to read a csv file which is stored in a hadoop cluster using spark. Every time i try to access it i get the following error: End of File Exception between local host

I have not setup hadoop locally since i already had access to hadoop cluster.

I may be missing some configurations but i dont know which ones. Would appreciate the help.

I tried to debug it using this : link

Did not work for me.

This is the code using spark.

val conf= new SparkConf().setAppName("Read").setMaster("local").set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
      .set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
val sc=new SparkContext(conf)
val data=sc.textfile("hdfs://<some-ip>/abc.csv)

I expect it to read the csv and convert it into RDD.

Getting this error: Exception in thread "main" java.io.EOFException: End of File Exception between local host is:

Andy_101
  • 1,246
  • 10
  • 20

1 Answers1

0

Run you spark jobs on hadoop cluster. Use below code:

val spark = SparkSession.builder().master("local[1]").appName("Read").getOrCreate()
val data = spark.sparkContext.textFile("<filePath>")

or you can use spark-shell as well.

If you want to access hdfs from your local, follow this:link

wypul
  • 807
  • 6
  • 9