3

Is it possible for a spark UDF to return more than one value? If so how are the individual items accessed in the dataframes API.

savx2
  • 1,011
  • 2
  • 10
  • 28
  • 2
    UDFs can only return single column values. These values can be collections or tuples but they can't be multiple values. If you really need to you can return a tuple and then split it using a command like `$"colname_1"`, `$"colname_2"` etc – evan.oman Dec 27 '16 at 00:37
  • related question: http://stackoverflow.com/questions/32196207/derive-multiple-columns-from-a-single-column-in-a-spark-dataframe – savx2 Dec 27 '16 at 03:26

2 Answers2

4

You have three options:

  • Return a Seq of items of the same type to create array column.

    udf(() => Seq(1.0, 2.0, 3.0))
    
  • Return a Map:

    udf(() => Map("x" -> 1.0, "y" -> -1.0))
    
  • Return a product (tuple or an instance of a case class) to create struct column.

    udf(() => (1.0, "foo", 5))
    
user7337271
  • 1,662
  • 1
  • 14
  • 23
  • 1
    Thanks. How about the second part of the question? My current solution is to add an additional select op to access individual items. Is there another way to flatten the returned values? – savx2 Dec 27 '16 at 03:29
  • I don't think so. – user7337271 Dec 28 '16 at 12:12
0

You can use struct and array method to achieve this.

Please check here for a very clean and easy answer by @ramesh

mythr
  • 80
  • 1
  • 10