13

As of Spark 1.5.0 it seems possible to write your own UDAF's for custom aggregations on DataFrames: Spark 1.5 DataFrame API Highlights: Date/Time/String Handling, Time Intervals, and UDAFs

It is however unclear to me if this functionality is supported in the Python API?

kentt
  • 543
  • 5
  • 12
  • 3
    No, it is not supported. You can call Scala UDAF but it is not pretty. See [my answer](http://stackoverflow.com/a/33257733/1560062) to [Spark: How to map Python with Scala or Java User Defined Functions?](http://stackoverflow.com/q/33233737/1560062) for a full example. – zero323 Nov 03 '15 at 15:17
  • 1
    @zero323 so is it now availaible in 1.6 spark or 1.61? – stackit May 18 '16 at 11:13
  • 2
    @stackit Neither 1.6.x nor 2.0. – zero323 May 31 '16 at 14:01
  • Possible duplicate of [Spark: How to map Python with Scala or Java User Defined Functions?](https://stackoverflow.com/questions/33233737/spark-how-to-map-python-with-scala-or-java-user-defined-functions) – eje Jul 07 '17 at 18:10

1 Answers1

2

You cannot defined Python UDAF in Spark 1.5.0-2.0.0. There is a JIRA tracking this feature request:

resolved with goal "later" so it probably won't happen anytime soon.

You can use Scala UDAF from PySpark - it is described Spark: How to map Python with Scala or Java User Defined Functions?

Community
  • 1
  • 1