2

Using Spark SQL's Java API, is it possible to somehow pass a Column object, or a similarly complex expression to Column's getItem() method? For example, if I have an array Column of size n, and I want to access item indexed n/2, is there currently an elegant way to do it? Of course I can write a UDF just for that, but that's a very ugly solution.

Currently, if you pass a Column object to getItem(), the code will compile (as it accepts an Object parameter), but will throw an exception at runtime.

Dean Gurvitz
  • 854
  • 1
  • 10
  • 24
  • 1
    What version of spark are you using? There's `posexplode` in v2.1 and above which you can use with some logic to achieve this. – pault Aug 03 '18 at 19:18
  • See [my answer](https://stackoverflow.com/a/51679214/5858851) to a related pyspark question. – pault Aug 03 '18 at 20:02
  • Isn't using posexplode a bit of an overkill? It also seems like this might be relatively inefficient. What I eventually did to solve the problem (the question is a year old now) was to write a UDF – Dean Gurvitz Aug 05 '18 at 07:25
  • I usually try to avoid udfs because that avoids serialization to python but in this case I'm not sure what's more efficient, particularly since you're using java. However, you should be able to use the `expr` method- i think that will be better than udf. – pault Aug 05 '18 at 08:57

0 Answers0