I am new to Hadoop and Spark. I am using Spark-2.1.1-bin-hadoop2.7. Using SparkR I want to load (read) data from Hadoop 2.7.3 HDFS.
I know, I can point to my Hadoop file using "hdfs://somepath-to-my-file" but I could not find a function in SparkR to do the job. read.df() doesn't work.
I am using sparkR.session() to connect to my Spark session. For launching R interface for Spark, I ran sparkR from spark's bin location.
In short, I want to load csv file from HDFS using sparkR.
Please help. If possible, provide example.
Thanks, SG
for line break. In pyspark or scala reading from hdfs works. See below (for scala example):
scala> val myFile="hdfs://localhost:9000/mydata/train.csv"
myFile: String = hdfs://localhost:9000/mydata/train.csv
scala> val txtfile = sc.textFile(myFile)
txtfile: org.apache.spark.rdd.RDD[String] = hdfs://localhost:9000/mydata/train.csv MapPartitionsRDD[1] at textFile at
scala> txtfile.count()
res0: Long = 892