How to un-nest a spark rdd that has the following type ((String, scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Int]]))

Question

Its a nested map with contents like this when i print it onto screen

(5, Map ( "ABCD" -> Map("3200" -> 3,
                    "3350.800" -> 4, 
                    "200.300" -> 3)
 (1, Map ( "DEF" -> Map("1200" -> 32,
                        "1320.800" -> 4, 
                        "2100" -> 3)

I need to get something like this

Case Class( 5, ABCD 3200, 3)
Case Class(5, ABCD 3350.800, 4)
CaseClass(5,ABCD., 200.300, 3)
CaseClass(1, DEF 1200, 32)
CaseClass(1 DEF, 1320.800, 4)

etc etc. basically a list of case classes

And map it to a case class object so that i can save it to cassandra. I have tried doing flatMapValues but that un nests the map only one level. Also used flatMap . that doesnt work either or I'am making mistakes

Any suggestions ?

Your example output doesn't make sense. Can you write it out as a scala type? List(5, ABCD,3200, 3) or something like that. — Justin Pihony, Jun 16 '15 at 16:37

The Archetypal Paul · Accepted Answer · 2015-06-16T20:39:59.087

Fairly straightforward using a for-comprehension and some pattern matching to destructure things:

 val in = List((5, Map ( "ABCD" -> Map("3200" -> 3,  "3350.800" -> 4, "200.300" -> 3))),
               (1, Map ("DEF" -> Map("1200" -> 32, "1320.800" -> 4, "2100" -> 3))))

case class Thing(a:Int, b:String, c:String, d:Int)

 for  { (index, m) <- in
        (k,v) <-m
        (innerK, innerV) <- v}
        yield Thing(index, k, innerK, innerV) 

//> res0: List[maps.maps2.Thing] = List(Thing(5,ABCD,3200,3), 
//                                      Thing(5,ABCD,3350.800,4),
//                                      Thing(5,ABCD,200.300,3), 
//                                      Thing(1,DEF,1200,32),
//                                      Thing(1,DEF,1320.800,4),
//                                      Thing(1,DEF,2100,3))

So let's pick part the for-comprehension

(index, m) <- in

This is the same as

t <- in
(index, m) = t

In the first line t will successively be set to each element of in. t is therefore a tuple (Int, Map(...)) Patten matching lets us put that "patten" for the tuple on the right hand side and the compiler picks apart the tuple, sets index to the Int and m to the Map.

(k,v) <-m

As before this is equivalent to

u <-m
(k, v) = u

And this time u takes each element of Map. Which again are tuples of key and value. So k is set successively to each key and v to the value.

And v is your inner map so we do the same thing again with the inner map

(innerK, innerV) <- v}

Now we have everything we need to create the case class. yield just says make a collection of whatever is "yielded" each time through the loop.

yield Thing(index, k, innerK, innerV)

Under the hood, this just translates to a set of maps/flatmaps

The yield is just the value Thing(index, k, innerK, innerV)

We get one of those for each element of v

v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}

but there's an inner map per element of the outer map

m.flatMap{y=>val (k, v) = y;v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}}

(flatMap because we get a List of Lists if we just did a map and we want to flatten it to just the list of items)

Similarly, we do one of those for every element in the List

in.flatMap (z => val (index, m) = z; m.flatMap{y=>val (k, v) = y;v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}}

Let's do that in _1, _2 style-y.

in.flatMap (z=> z._2.flatMap{y=>y._2.map{x=>;Thing(z._1, y._1, x._1, x._2)}}}

which produces exactly the same result. But isn't it clearer as a for-comprehension?

thank you @Paul .... your answer is a bit tricky for me to understand ..can you explain a bit more on whats happening... i'am fairly new to scala/pattern matching in particular — sainath reddy, Jun 16 '15 at 18:58
You're missing some cool tools if you don't know pattern matching, so I recommend you read up on it. i'll add an explanation — The Archetypal Paul, Jun 16 '15 at 20:04
I am for sure .. will read up @paul any suggestions for blog pages that have involved patter matching examples. Your answer makes complete sense now . Thanks for the explanation — sainath reddy, Jun 16 '15 at 20:34
Any decent Scala book will cover this. I don't know of one to particularly recommend. — The Archetypal Paul, Jun 16 '15 at 20:42
@The Archetypal Paul can you help and suggest how to handle this https://stackoverflow.com/questions/62036791/while-writing-to-hdfs-path-getting-error-java-io-ioexception-failed-to-rename — BdEngineer, May 27 '20 at 06:49

abalcerek · Answer 2 · 2015-06-16T18:20:53.657

1

You can do this like this if you prefer collection operation

    case class Record(v1: Int, v2: String, v3: Double, v4: Int)

    val data = List(
      (5, Map ( "ABC" ->
        Map(
          3200. -> 3,
          3350.800 -> 4,
          200.300 -> 3))
        ),
      (1, Map ( "DEF" ->
        Map(
          1200. -> 32,
          1320.800 -> 4,
          2100. -> 3))
        )
    )

    val rdd = sc.parallelize(data)

    val result = rdd.flatMap(p => {
      p._2.toList
        .flatMap(q => q._2.toList.map(l => (q._1, l)))
        .map((p._1, _))
    }).map(p => Record(p._1, p._2._1, p._2._2._1, p._2._2._2))

    println(result.collect.toList)
    //List(
    //  Record(5,ABC,3200.0,3),
    //  Record(5,ABC,3350.8,4),
    //  Record(5,ABC,200.3,3),
    //  Record(1,DEF,1200.0,32),
    //  Record(1,DEF,1320.8,4),
    //  Record(1,DEF,2100.0,3)
    //)

edited Jun 16 '15 at 18:20

answered Jun 16 '15 at 18:14

abalcerek

1,807
1
22
27

thank you @ user52045 i figured this out from http://stackoverflow.com/questions/30080136/scala-spark-array-mapping an hour ago forgot to write up an answer.... anyways thank you I was doing exactly the same thing . – sainath reddy Jun 16 '15 at 18:55
That can be shortened. You don't need to convert Maps to Lists before mapping over them. – The Archetypal Paul Jun 16 '15 at 20:26
yeah i was about to point that out as well .. both your answers are right but pauls answer is more readable after his explanation . Thank you @Paul – sainath reddy Jun 16 '15 at 20:30

How to un-nest a spark rdd that has the following type ((String, scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Int]]))

2 Answers2