0

okay, I want to iterate DataFrame and operate each row with outer scope DataFrame variable but I am getting NullPointerException when I access to outer variable. here is my code

val outer: DataFrame = ...
val someDataFrame: DataFrame = ...
someDataFrame.foreach(row => {
    outer.show()
})
// this occurs java.lang.NullPointerException

here is another case

val outer: DataFrame = ...
val someDataFrame: DataFrame = ...
for(each <- someDataFrame) {
    outer.show()
}
// this occurs java.lang.NullPointerException

but it works with looping a List

val outer: DataFrame = ...
val list = List(1,2,3,4,5)
list.foreach(each => {
    outer.show()
})
// this prints outer 5 times

and this case it works too

val x = 1
val someDataFrame: DataFrame = ...
someDataFrame.foreach(row => {
    println(x)
})
// 1

I am using scala 2.11 and spark 2.3.0 and it seems like accessing to outer scope DataFrame only occurs NullPointerException. Can anyone explain why this happens ?

Daniel
  • 606
  • 7
  • 23
  • 2
    the problem is not that the variable is outer scope, but that it's a dataframe. You cannot use dataframes within a dataframe map/foreach (i.e. remote-code) function – Raphael Roth Jun 24 '18 at 08:20
  • Is there a way i can at least refer outer DataFrame ? – Daniel Jun 24 '18 at 08:21
  • 1
    no. You should either join the dataframes or collect one (the smaller) e.g. as a plain scala `Map` and then broadcast it. – Raphael Roth Jun 24 '18 at 08:24

0 Answers0