Spark dataset - referring to column with type array w/ callUDF

Question

my data's schema looks like this:

root
 |-- dt: string (nullable = true)
 |-- myarr: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)

entryPage: Unit = ()

"myarr" is a Seq of size 2 constantly.

I'm trying to refer to the elements of this array inside a udf call:

filter(callUDF("size", $"myarr[0]") > 0 && callUDF("size", $"myarr[1]") > 0)

but I'm getting the below exception:

org.apache.spark.sql.AnalysisException: cannot resolve '`myarr[0]`' given input columns: [dt, myarr];;
'Filter (('size('myarr[0]) > 0) && ('size('myarr[1]) > 0))

Any ideas why?

Thanks!

It does work like this BTW `.filter("size(myarr[0]) > 0 AND size(myarr[1]) > 0")` — Maayan, Apr 27 '18 at 05:04
Try using `$"myarr"(0)` to get the element. If you write `$"myarr[0]"` Spark will look for a column with that exact name, including the `[0]` part. — Shaido, Apr 27 '18 at 05:14

score 0 · Answer 1 · edited Apr 27 '18 at 07:10

0

This works:

 filter(size($"myarr"(0)) > 0 && size($"myarr"(1)) > 0)

And I also found out that I can use size(..) instead of callUDF("size",...)

edited Apr 27 '18 at 07:10

Ramesh Maharjan

41,071
6
69
97

answered Apr 27 '18 at 05:23

Maayan

273
1
2
11

Spark dataset - referring to column with type array w/ callUDF

1 Answers1