1

I am new to Spark and Scala as well, so this might be a very basic question.

I created a text file with 4 lines of some words. The rest of the code is as below:

val data = sc.textFile("file:///home//test.txt").map(x=> x.split(" "))

println(data.collect)
println(data.take(2))
println(data.collect.foreach(println))

All the above "println" commands are producing output as: [Ljava.lang.String;@1ebec410

Any idea how do I display the actual contents of the rdd, I have even tried "saveAstextfile", it also save the same line as java...

I am using Intellij IDE for spark scala and yes, I have gone through other posts related to this, but no help. Thanking you in advance

SarahB
  • 318
  • 1
  • 4
  • 18

2 Answers2

2

The final return type of RDD is RDD[Array[String]] Previously you were printing the Array[String] that prints something like this [Ljava.lang.String;@1ebec410) Because the toString() method of Array is not overridden so it is just printing the HASHCODE of object

You can try casting Array[String] to List[String] by using implicit method toList now you will be able to see the content inside the list because toString() method of list in scala in overridden and shows the content

That Means if you try

data.collect.foreach(arr => println(arr.toList))

this will show you the content or as @Raphael has suggested data.collect().foreach(arr => println(arr.mkString(", "))) this will also work because arr.mkString(", ")will convert the array into String and Each element Seperated by ,

Hope this clears you doubt Thanks

Akash Sethi
  • 2,284
  • 1
  • 20
  • 40
0

data is of type RDD[Array[String]], what you print is the toString of the Array[String] ( [Ljava.lang.String;@1ebec410), try this:

data.collect().foreach(arr => println(arr.mkString(", ")))
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
  • This works as I tried this too. But is there any explanation to this? I would love to understand the details behind this, can you please guide me to a link or something? – SarahB Apr 28 '17 at 06:43
  • I changed map to flatMap, with rest of the code being the same. Now the output I am getting is for each line, every word is split into letters. I wanted to understand how map and flatMap work. this is getting frustrating for me. – SarahB Apr 28 '17 at 06:46
  • `flatMap` returns an `RDD[String]` by giving each element of the Array being returned by the map function it's own RDD row. Essentially flattening the 2D structure. – ImDarrenG Apr 28 '17 at 08:23