0

my data's schema looks like this:

root
 |-- dt: string (nullable = true)
 |-- myarr: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)

entryPage: Unit = ()

"myarr" is a Seq of size 2 constantly.

I'm trying to refer to the elements of this array inside a udf call:

filter(callUDF("size", $"myarr[0]") > 0 && callUDF("size", $"myarr[1]") > 0)

but I'm getting the below exception:

org.apache.spark.sql.AnalysisException: cannot resolve '`myarr[0]`' given input columns: [dt, myarr];;
'Filter (('size('myarr[0]) > 0) && ('size('myarr[1]) > 0))

Any ideas why?

Thanks!

Maayan
  • 273
  • 1
  • 2
  • 11
  • It does work like this BTW `.filter("size(myarr[0]) > 0 AND size(myarr[1]) > 0")` – Maayan Apr 27 '18 at 05:04
  • Try using `$"myarr"(0)` to get the element. If you write `$"myarr[0]"` Spark will look for a column with that exact name, including the `[0]` part. – Shaido Apr 27 '18 at 05:14

1 Answers1

0

This works:

 filter(size($"myarr"(0)) > 0 && size($"myarr"(1)) > 0)
  • And I also found out that I can use size(..) instead of callUDF("size",...)
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
Maayan
  • 273
  • 1
  • 2
  • 11