Questions tagged [user-defined-aggregate]
38 questions
12
votes
2 answers
SQL user defined aggregate order of values preserved?
Im using the code from this MSDN page to create a user defined aggregate to concatenate strings with group by's in SQL server. One of my requirements is that the order of the concatenated values are the same as in the query. For example:
Value …

Magnus
- 45,362
- 8
- 80
- 118
8
votes
2 answers
Why Mutable map becomes immutable automatically in UserDefinedAggregateFunction(UDAF) in Spark
I am trying to define a UserDefinedAggregateFunction(UDAF) in Spark, which counts the number of occurrences for each unique values in a column of a group.
This is an example:
Suppose I have a dataframe df like…

Fan L.
- 139
- 5
6
votes
1 answer
Can every Spark UDAF be used with Window?
I always thought that Spark does not allow to define User-Defined-Window-Functions. I just tested the "Geometric Mean" UDAF example from here (https://docs.databricks.com/spark/latest/spark-sql/udaf-scala.html) as a window function, and it seems to…

Raphael Roth
- 26,751
- 15
- 88
- 145
6
votes
1 answer
When does merge happen in User Defined Aggregating Functions UDAF in Spark
I want to know at which circumstances Spark will perform merge as part of the UDAF function.
Motivation:
I am using a lot of UDAF functions OVER a Window in my Spark project. Often I want to answer a question like:
How many times a credit card…

astro_asz
- 2,278
- 3
- 15
- 31
4
votes
1 answer
Spark Scala: User defined aggregate function that calculates median
I´m trying to find a way, to calculate the Median for a given Dataframe.
val df = sc.parallelize(Seq(("a",1.0),("a",2.0),("a",3.0),("b",6.0), ("b", 8.0))).toDF("col1", "col2")
+----+----+
|col1|col2|
+----+----+
| a| 1.0|
| a| 2.0|
| a|…

johntechendso
- 233
- 1
- 3
- 10
3
votes
2 answers
User-Defined Aggregate in SQL Server 2008 - How to deploy with MaxByteSize = -1?
I read here (and elsewhere) that it's possible, in SQL Server 2008, to build a user-defined aggregate which can return a string longer than 8000 characters. This is exactly what I need.
Supposedly, the method is to set maxByteSize to -1 instead of…

DanM
- 7,037
- 11
- 51
- 86
3
votes
1 answer
Spark UDAF: How to get value from input by column field name in UDAF (User-Defined Aggregation Function)?
I am trying to use Spark UDAF to summarize two existing columns into a new column. Most of the tutorials on Spark UDAF out there use indices to get the values in each column of the input Row. Like this:
input.getAs[String](1)
, which is used in my…

CyberPlayerOne
- 3,078
- 5
- 30
- 51
3
votes
1 answer
Multiple column output in UDAF Spark
I get some data from my mongodb that looks like this:
+------+-------+
| view | data |
+------+-------+
| xx | *** |
| yy | *** |
| xx | *** |
+------+-------+
It's not really necessary to know what…

Boendal
- 2,496
- 1
- 23
- 36
2
votes
1 answer
Why does MutableAggregationBuffer in UserDefinedAggregateFunction require a bufferSchema?
I am looking into implementing a UserDefinedAggregateFunction in spark and see that a bufferSchema is needed. I understand how to create it, but my issue is why does it require a bufferSchema? Should it not only need a size (number of elements for…

Ghastone
- 75
- 4
2
votes
2 answers
Make SQL Server CLR aggregate similar to native aggregates
I'm comparing my custom CLR aggregate vs AVG (SQL Server 2017). My queries are:
SELECT groupId, Helpers.CustomCLR(value)
FROM table
group by groupId
SELECT groupId, AVG(value)
FROM table
group by groupId
And CLR is
[Serializable]
…

user2820173
- 308
- 2
- 13
2
votes
2 answers
SQL CLR aggregate not terminating correctly when applied over huge amount of data
I have create and used a lot of times a SQL CLR aggregate which is concatenating values - it also order the values by specified number and use user input separator for concatenating the them.
I have used the same aggregate over large amount of data…

gotqn
- 42,737
- 46
- 157
- 243
2
votes
1 answer
Direct arguments in PostgreSQL user-defined aggregate functions
I am creating a user-defined aggregate function that needs an additional parameter. More precisely it is a cumulative (aka window) minimum that takes as second parameter a time interval defining the window. Since the aggregate function operates on…

Esteban Zimanyi
- 201
- 3
- 6
2
votes
1 answer
Cannot Pass Null Value to Custom Aggregate
Afternoon,
I'm writing a custom median function (without looking at existing solutions, i like the challenge), after lots of fiddling I'm most of the way there. I cannot however pass in a column that contains a null value. I'm handling this in the…

Ollie
- 243
- 2
- 9
2
votes
1 answer
Instantiate tuple value in Cassandra UDA function with map and tuple value (for daily average)
I am trying to create a function which counts and sums values by day (to later calculate the average). I got this far:
CREATE OR REPLACE FUNCTION state_group_count_and_sum( state map>>, timestamp timestamp,…

Roy van der Valk
- 527
- 1
- 6
- 18
2
votes
2 answers
Msg 6558: CREATE AGGREGATE failed because type 'Concatenate' does not conform to UDAGG specification
I've created a SQLCLR Assembly and added it, when I run the T-SQL command:
CREATE AGGREGATE Concat (@input nvarchar(max))
RETURNS nvarchar(max)
EXTERNAL NAME Sql_ClrAggregates.Concatenate;
I get the error:
Msg 6558, Level 16, State 1, Line 1
…

Stephen Turner
- 7,125
- 4
- 51
- 68