Scala RDD[String] to RDD[String,String]

Question

I have a RDD[String] which contains following data:

data format : ('Movie Name','Actress Name')

('Night of the Demons (2009)  (uncredited)', '"Steff", Stefanie Oxmann Mcgaha')
('The Bad Lieutenant: Port of Call - New Orleans (2009)  (uncredited)', '"Steff", Stefanie Oxmann Mcgaha') 
('"Please Like Me" (2013) {All You Can Eat (#1.4)}', '$haniqua') 
('"Please Like Me" (2013) {French Toast (#1.2)}', '$haniqua') 
('"Please Like Me" (2013) {Horrible Sandwiches (#1.6)}', '$haniqua')

I want to convert this to RDD[String,String] such as first element within ' ' will be my first String in RDD and second element within ' ' will be my second String in RDD.

I tried this:

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
val splitRdd = rdd1.map( line => line.split(",") )
splitRdd.foreach(println)

but it's giving me an error as :

[Ljava.lang.String;@7741fb9
[Ljava.lang.String;@225f63a5
[Ljava.lang.String;@63640bc4
[Ljava.lang.String;@1354c1de

That isn't an error message, that's the object-ids for a bunch of strings. — Michael Lorton, Oct 08 '16 at 01:41
@Malvolio Can you please tell me how can I remove that error — user225508, Oct 08 '16 at 01:50

score 5 · Answer 1 · edited May 23 '17 at 12:00

[Ljava.lang.String;@7741fb9 is not an error, This is wt is printed when you try to print an array.

[ - an single-dimensional array

L - the array contains a class or interface

java.lang.String - the type of objects in the array

@ - joins the string together

7741fb9 the hashcode of the object.

To print String array you can try this code:

import scala.runtime.ScalaRunTime._
splitRdd.foreach(array => println(stringOf(array)))

Source

Kris · Answer 2 · 2016-10-08T17:03:27.247

0

It's not an error. we could also use flatMap() here to avoid confusion,

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
rdd1.flatMap( line => line.split(",")).foreach(println)

Here, The input function to map returns a single element (array), while the flatMap returns a list of elements (0 or more). Also, the output of the flatMap is flattened.

edited Oct 08 '16 at 17:03

answered Oct 08 '16 at 13:35

Kris

1,618
1
13
13

score 0 · Answer 3 · answered Oct 09 '16 at 02:09

0

Since it is csv file with field-enclosed & row-enclosed, you need to read the file using regular expressions. Simple split doesn't work.

answered Oct 09 '16 at 02:09

KiranM

1,306
1
11
20

score 0 · Answer 4 · answered Oct 15 '16 at 15:56

Try this to convert RDD[String] to RDD[String,String]

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
val splitRdd = rdd1.map( line => (line.split(",")(0), line.split(",")(1)) )

The above line returns the rdd as key, value pair [Tuple] RDD.

Scala RDD[String] to RDD[String,String]

4 Answers4