0

Hi I am trying to read a images file from the local file system and store it in HDFS file system through spark and scala.

Here is mycode.

val streams = spark.sparkContext.wholeTextFiles("file:///home/jeffi/input/Images_Test/")
val op = streams.toDF()  //op: org.apache.spark.sql.DataFrame = [_1: string, _2: string]
op.printSchema() //root |-- _1: string (nullable = true) |-- _2: string (nullable = true)

I tried to write the op dataframe in to HDFS, Then I got the following exception

 op.write.text("/home/cisadmin/image_op")

org.apache.spark.sql.AnalysisException: Text data source supports only a single column, and you have 2 columns.;

I tried with various types in write method like op.write,op.write.wholeTextFiles("")

Nothing works for me. Any help would be appreciated.

Teju Priya
  • 595
  • 3
  • 8
  • 18

1 Answers1

0

Regarding your error, If you check text method it says,

Saves the content of the [[DataFrame]] in a text file at the specified path.
The DataFrame must have only one column that is of string type.
Each row becomes a new line in the output file.

But in your case op has two columns, so either you can save your file as csv or convert it to RDD then save it as a text file.

But as Ramesh Maharjan mentioned you should not use text APIs for reading image files.

vindev
  • 2,240
  • 2
  • 13
  • 20