2

I'm looking for a list of pre-defined aggregation functions in Spark SQL. I have in mind something analogous to Presto Aggregate Functions.

I Ctrl+F'd around a little in the SQL API docs to no avail... it's also hard to tell at a glance which functions are for aggregation vs. not. For example, if I didn't know avg is an aggregation function I'd be hard pressed to tell it is one (in a way that's actually scalable to the full set of functions):

avg - avg(expr) - Returns the mean calculated from values of a group.

If such a list doesn't exist, can someone at least confirm to me that there's no pre-defined function like any/bool_or or all/bool_and to determine if any or all of a boolean column in a group are true (or false)?

For now, my workaround is

select grp_col, count(if(bool_col, true, NULL)) > 0 any_agg
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198

2 Answers2

2

Just take a look at Spark Docs on Aggregate functions section

Euclides
  • 287
  • 1
  • 5
1

The list of functions is here under Relational Grouped Dataset - specifically the API's that return DataFrame (not RelationalGroupedDataSet):

https://spark.apache.org/docs/latest/api/scala/index.html?org/apache/spark/sql/RelationalGroupedDataset.html#org.apache.spark.sql.RelationalGroupedDataset

enter image description here

WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
  • Seems no boolean aggregate, like [`bool_or(x)`](https://www.postgresql.org/docs/current/functions-aggregate.html)... What the best practice to implement it? [Example](https://stackoverflow.com/a/22518890/287948) not works, and convert to List seems ugly ... Need to implement a method like `sql("(select true as x) union (select false as x)").rowsFoldLeft(true)(_ && _)`. – Peter Krauss Sep 06 '19 at 19:37
  • Well, for boolean there are simple workarounds, using the same dataframe of union, `sameDataframe.filter("x").first`, but the question is supposing aggregation functions. – Peter Krauss Sep 06 '19 at 20:12
  • can i upvote my own answer? I'm coming here 2 yrs later having completely forgotten about the Q&A. Argh the link is broken: the spark maintainers have some work to do. – WestCoastProjects May 06 '21 at 02:36