Questions tagged [user-defined-aggregate]

38 questions
12
votes
2 answers

SQL user defined aggregate order of values preserved?

Im using the code from this MSDN page to create a user defined aggregate to concatenate strings with group by's in SQL server. One of my requirements is that the order of the concatenated values are the same as in the query. For example: Value …
Magnus
  • 45,362
  • 8
  • 80
  • 118
8
votes
2 answers

Why Mutable map becomes immutable automatically in UserDefinedAggregateFunction(UDAF) in Spark

I am trying to define a UserDefinedAggregateFunction(UDAF) in Spark, which counts the number of occurrences for each unique values in a column of a group. This is an example: Suppose I have a dataframe df like…
Fan L.
  • 139
  • 5
6
votes
1 answer

Can every Spark UDAF be used with Window?

I always thought that Spark does not allow to define User-Defined-Window-Functions. I just tested the "Geometric Mean" UDAF example from here (https://docs.databricks.com/spark/latest/spark-sql/udaf-scala.html) as a window function, and it seems to…
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
6
votes
1 answer

When does merge happen in User Defined Aggregating Functions UDAF in Spark

I want to know at which circumstances Spark will perform merge as part of the UDAF function. Motivation: I am using a lot of UDAF functions OVER a Window in my Spark project. Often I want to answer a question like: How many times a credit card…
astro_asz
  • 2,278
  • 3
  • 15
  • 31
4
votes
1 answer

Spark Scala: User defined aggregate function that calculates median

I´m trying to find a way, to calculate the Median for a given Dataframe. val df = sc.parallelize(Seq(("a",1.0),("a",2.0),("a",3.0),("b",6.0), ("b", 8.0))).toDF("col1", "col2") +----+----+ |col1|col2| +----+----+ | a| 1.0| | a| 2.0| | a|…
3
votes
2 answers

User-Defined Aggregate in SQL Server 2008 - How to deploy with MaxByteSize = -1?

I read here (and elsewhere) that it's possible, in SQL Server 2008, to build a user-defined aggregate which can return a string longer than 8000 characters. This is exactly what I need. Supposedly, the method is to set maxByteSize to -1 instead of…
DanM
  • 7,037
  • 11
  • 51
  • 86
3
votes
1 answer

Spark UDAF: How to get value from input by column field name in UDAF (User-Defined Aggregation Function)?

I am trying to use Spark UDAF to summarize two existing columns into a new column. Most of the tutorials on Spark UDAF out there use indices to get the values in each column of the input Row. Like this: input.getAs[String](1) , which is used in my…
3
votes
1 answer

Multiple column output in UDAF Spark

I get some data from my mongodb that looks like this: +------+-------+ | view | data | +------+-------+ | xx | *** | | yy | *** | | xx | *** | +------+-------+ It's not really necessary to know what…
Boendal
  • 2,496
  • 1
  • 23
  • 36
2
votes
1 answer

Why does MutableAggregationBuffer in UserDefinedAggregateFunction require a bufferSchema?

I am looking into implementing a UserDefinedAggregateFunction in spark and see that a bufferSchema is needed. I understand how to create it, but my issue is why does it require a bufferSchema? Should it not only need a size (number of elements for…
2
votes
2 answers

Make SQL Server CLR aggregate similar to native aggregates

I'm comparing my custom CLR aggregate vs AVG (SQL Server 2017). My queries are: SELECT groupId, Helpers.CustomCLR(value) FROM table group by groupId SELECT groupId, AVG(value) FROM table group by groupId And CLR is [Serializable] …
2
votes
2 answers

SQL CLR aggregate not terminating correctly when applied over huge amount of data

I have create and used a lot of times a SQL CLR aggregate which is concatenating values - it also order the values by specified number and use user input separator for concatenating the them. I have used the same aggregate over large amount of data…
gotqn
  • 42,737
  • 46
  • 157
  • 243
2
votes
1 answer

Direct arguments in PostgreSQL user-defined aggregate functions

I am creating a user-defined aggregate function that needs an additional parameter. More precisely it is a cumulative (aka window) minimum that takes as second parameter a time interval defining the window. Since the aggregate function operates on…
2
votes
1 answer

Cannot Pass Null Value to Custom Aggregate

Afternoon, I'm writing a custom median function (without looking at existing solutions, i like the challenge), after lots of fiddling I'm most of the way there. I cannot however pass in a column that contains a null value. I'm handling this in the…
2
votes
1 answer

Instantiate tuple value in Cassandra UDA function with map and tuple value (for daily average)

I am trying to create a function which counts and sums values by day (to later calculate the average). I got this far: CREATE OR REPLACE FUNCTION state_group_count_and_sum( state map>>, timestamp timestamp,…
2
votes
2 answers

Msg 6558: CREATE AGGREGATE failed because type 'Concatenate' does not conform to UDAGG specification

I've created a SQLCLR Assembly and added it, when I run the T-SQL command: CREATE AGGREGATE Concat (@input nvarchar(max)) RETURNS nvarchar(max) EXTERNAL NAME Sql_ClrAggregates.Concatenate; I get the error: Msg 6558, Level 16, State 1, Line 1 …
Stephen Turner
  • 7,125
  • 4
  • 51
  • 68
1
2 3