1

I have a RDD[String] which contains following data:

data format : ('Movie Name','Actress Name')

('Night of the Demons (2009)  (uncredited)', '"Steff", Stefanie Oxmann Mcgaha')
('The Bad Lieutenant: Port of Call - New Orleans (2009)  (uncredited)', '"Steff", Stefanie Oxmann Mcgaha') 
('"Please Like Me" (2013) {All You Can Eat (#1.4)}', '$haniqua') 
('"Please Like Me" (2013) {French Toast (#1.2)}', '$haniqua') 
('"Please Like Me" (2013) {Horrible Sandwiches (#1.6)}', '$haniqua')

I want to convert this to RDD[String,String] such as first element within ' ' will be my first String in RDD and second element within ' ' will be my second String in RDD.

I tried this:

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
val splitRdd = rdd1.map( line => line.split(",") )
splitRdd.foreach(println)

but it's giving me an error as :

[Ljava.lang.String;@7741fb9
[Ljava.lang.String;@225f63a5
[Ljava.lang.String;@63640bc4
[Ljava.lang.String;@1354c1de
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
user225508
  • 21
  • 5

4 Answers4

5

[Ljava.lang.String;@7741fb9 is not an error, This is wt is printed when you try to print an array.

[ - an single-dimensional array

L - the array contains a class or interface

java.lang.String - the type of objects in the array

@ - joins the string together

7741fb9 the hashcode of the object.

To print String array you can try this code:

import scala.runtime.ScalaRunTime._
splitRdd.foreach(array => println(stringOf(array)))

Source

Community
  • 1
  • 1
bob
  • 4,595
  • 2
  • 25
  • 35
0

It's not an error. we could also use flatMap() here to avoid confusion,

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
rdd1.flatMap( line => line.split(",")).foreach(println)

Here, The input function to map returns a single element (array), while the flatMap returns a list of elements (0 or more). Also, the output of the flatMap is flattened.

Kris
  • 1,618
  • 1
  • 13
  • 13
0

Since it is csv file with field-enclosed & row-enclosed, you need to read the file using regular expressions. Simple split doesn't work.

KiranM
  • 1,306
  • 1
  • 11
  • 20
0

Try this to convert RDD[String] to RDD[String,String]

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
val splitRdd = rdd1.map( line => (line.split(",")(0), line.split(",")(1)) )

The above line returns the rdd as key, value pair [Tuple] RDD.

Shankar
  • 8,529
  • 26
  • 90
  • 159