3

Referring to this question : NullPointerException in Scala Spark, appears to be caused be collection type?

Answer states "Spark doesn't support nesting of RDDs (see https://stackoverflow.com/a/14130534/590203 for another occurrence of the same problem), so you can't perform transformations or actions on RDDs inside of other RDD operations."

This code :

val x = sc.parallelize(List(1 , 2, 3))

def fun1(n : Int) = {
    fun2(n)
}

def fun2(n: Int) = {
    n + 1
}

x.map(v => fun1(v)).take(1)

prints :

Array[Int] = Array(2)

This is correct.

But does this not disagree with "can't perform transformations or actions on RDDs inside of other RDD operations." since a nested action is occurring on an RDD ?

Community
  • 1
  • 1
blue-sky
  • 51,962
  • 152
  • 427
  • 752

1 Answers1

2

No. In the linked question d.filter(...) returns an RDD, so the type of

d.distinct().map(x => d.filter(_.equals(x)))

is RDD[RDD[String]]. This isn't allowed, but it doesn't happen in your code. If I understand the answer right, you can't refer to d or other RDDs inside map as well even if you don't get RDD[RDD[SomeType]] in the end.

Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487