Is it possible for a spark UDF to return more than one value? If so how are the individual items accessed in the dataframes API.
Asked
Active
Viewed 4,922 times
3
-
2UDFs can only return single column values. These values can be collections or tuples but they can't be multiple values. If you really need to you can return a tuple and then split it using a command like `$"colname_1"`, `$"colname_2"` etc – evan.oman Dec 27 '16 at 00:37
-
related question: http://stackoverflow.com/questions/32196207/derive-multiple-columns-from-a-single-column-in-a-spark-dataframe – savx2 Dec 27 '16 at 03:26
2 Answers
4
You have three options:
Return a
Seq
of items of the same type to createarray
column.udf(() => Seq(1.0, 2.0, 3.0))
Return a
Map
:udf(() => Map("x" -> 1.0, "y" -> -1.0))
Return a product (tuple or an instance of a case class) to create
struct
column.udf(() => (1.0, "foo", 5))

user7337271
- 1,662
- 1
- 14
- 23
-
1Thanks. How about the second part of the question? My current solution is to add an additional select op to access individual items. Is there another way to flatten the returned values? – savx2 Dec 27 '16 at 03:29
-