5

I want to write a function that calculates a simple trimmed mean calculation in MySQL. The function will (obviously) be an aggregate function. I am new to writing functions etc in MySQL so could do with some help.

The algorithm of the trimmed mean will be as follows (pseudocode):

CREATE AGGREGATE FUNCTION trimmed_mean(elements DOUBLE[], trim_size INTEGER)
RETURNS DOUBLE
BEGIN
   -- determine number of elements
   -- ensure that number of elements is greater than 2 * trim_size else return error
   -- order elements in ASC order
   -- chop off smallest trim_size elements and largest trim_size elements
   -- calculate arithmetic average of the remaining elements
   -- return arithmetic average
END

Can anyone help with how to write the function above correctly, for use with MySQL?

Milimetric
  • 13,411
  • 4
  • 44
  • 56
Homunculus Reticulli
  • 65,167
  • 81
  • 216
  • 341
  • 2
    Is there a specific reason that you want to do this as a function, rather than as a query? Also, given that you preferred an answer to a previous question because it used standard SQL, will you need to be able to use this across multiple different RDBMSs (ie. not just MySQL)? –  Jan 18 '12 at 10:34
  • @MarkBannister I intended to work with PG (my favorite db!), but I had to jump through too many hoops to get PG to work with PHP (recompiling PHP [or similar crazy asks] etc), so I opted for mySQL which I already have working with PHP. The reason I wanted it as a function is that I want to return a trimmed mean as a column in a query. I suppose (if I had an SQL solution), I could hack together some SQL to 'paste' the trimmed mean values as a column to my returned dataset. – Homunculus Reticulli Jan 18 '12 at 11:17
  • @MarkBannister: short answer to your question. An ANSI SQL version would be ideal. But since I happen to be working with mySQL, then if I __have__ to be db-centric, a MySQL flavored SQL will take precedence. – Homunculus Reticulli Jan 18 '12 at 11:24
  • I think, the question might be useful: http://stackoverflow.com/questions/8639073/how-can-i-return-the-numerical-boxplot-data-of-all-results-using-1-mysql-query – newtover Jan 18 '12 at 12:53

3 Answers3

1

That's no small task, you need to write it in c/c++...


An option within MySQL itself, is to write a view or scalar function that aggregates the data how you want to, but from a specific table. This obviously restricts the function to a single source table, which may not be ideal.

A way around this could be to have a table dedicated to this function...

  • start a transaction
  • clear the table
  • insert your sample data
  • query the view/function

(Or something similar)

This precludes GROUP BY variations, unless you use dynamic sql or pass parameters to your function for specific grouping patterns.

It's all less than ideal, sorry.

MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • I don't mind going down the C/C++ route (as a last resort) - but I'd rather not, simply because I don't want to spend hours familiarising myself with mySQL internal data types etc. If there is a 'hello world' example for an aggregate written in C/C++, that would be a very good starting point (in reducing the learning curve), since what I want to do, is relatively trivial (the algorithm part that is). – Homunculus Reticulli Jan 18 '12 at 11:20
  • In the link from Dems: *"The MySQL source distribution includes a file sql/udf_example.c that defines 5 new functions."* – ypercubeᵀᴹ Jan 18 '12 at 11:31
1

Have a look at this example (for MySQL) -

Create test table:

CREATE TABLE test_table (
  id INT(11) NOT NULL AUTO_INCREMENT,
  value INT(11) DEFAULT NULL,
  PRIMARY KEY (id)
);

INSERT INTO test_table(value) VALUES 
  (10), (2), (3), (5), (4), (7), (1), (9), (3), (5), (9);

Let's calculate avg value (edited variant):

SET @trim_size = 3;

SELECT AVG(value) avg FROM (
  SELECT value, @pos:=@pos + 1 pos FROM (SELECT * FROM test_table ORDER BY value) t1, (SELECT @pos:=0) t2
  ) t
WHERE pos > @trim_size AND pos <= @pos - @trim_size;

+--------+
| avg    |
+--------+
| 4.8000 |
+--------+
Devart
  • 119,203
  • 23
  • 166
  • 186
  • Surely it ought to be 4.8 - the trim should eliminate one but not both of the 3s? ie: ( not(1 + 2 + 3) + 3 + 4 + 5 + 5 + 7 + not(9 + 9 + 10) ) / 5 –  Jan 18 '12 at 12:37
  • @Mark Bannister You are right. I have missed the point - order elements in ASC order. It should be ordered by `value` field. I have edited query. Thanks;-) – Devart Jan 18 '12 at 15:02
0

I created a UDF that does exactly this, matching the Excel function referenced here https://support.office.com/en-us/article/trimmean-function-d90c9878-a119-4746-88fa-63d988f511d3

https://github.com/StirlingMarketingGroup/mysql-trimmean

Usage

`trimmean` ( `NumberColumn`, double `Percent` [, integer `Decimals` = 4 ] )
  • `NumberColumn`

    • The column of values to trim and average.
  • `Percent`

    • The fractional number of data points to exclude from the calculation. For example, if percent = 0.2, 4 points are trimmed from a data set of 20 points (20 x 0.2): 2 from the top and 2 from the bottom of the set.
  • `Decimals`

    • Optionally, the number of decimal places to output. Default is 4.
Brian Leishman
  • 8,155
  • 11
  • 57
  • 93