Get the running average of a column in T-SQL

Question

Okay so I have a table and in one column I have some data and the second column the average of the data. Example

id|Data|avg
1 |20  |20
2 |4   |12
3 |18  |14

How do I populate the avg column on insert with the running average of the Data column using T-SQL?

EDIT: Sorry guys, this was actually a stupid mistake I made. I assumed I had SQL 2014 but after trying Stephan's code and getting some errors, I went back to confirm and realize I use SQL 2008. Sorry for the misinformation. I have also updated the tags

@Stephan, there is nothing that says that the `avg` column is a calculated column in the sql server sense. Just that OP is trying to calculate it upon insert. — crthompson, Apr 15 '15 at 17:47
Could you please tell us which version of SQL server you are using? There are ways to do this. — Matt, Apr 15 '15 at 17:47
Not sure how to do it on insert (not sure that it is a great idea from an I/O perspective if your insert is blocking other queries) but this question (specifically the 2nd part of the accepted answer) is an elegant solution to the problem you are trying to address: http://stackoverflow.com/questions/26618353/t-sql-calculate-moving-average — JP., Apr 15 '15 at 17:53

Gordon Linoff · Answer 1 · 2015-04-15T19:10:54.580

6

On insert, assuming id is an identity and you are just putting in data:

insert into table t(id, data, avg)
    select @data, @data * (1.0 / n) + avg * (n - 1.0)/n
    from (select count(*) as cnt, avg(data) as avg
          from t
         ) t;

In SQL Server 2012+, it is easy enough just to get it on output:

select t.*, avg(data) over (order by id) as cume_avg
from table t

Prior to SQL Server 2012, you would do this with a correlated subquery or apply:

select t.*,
       (select avg(data)
        from table t2
        where t2.id <= t.id
       ) as cume_avg 
from table t;

Here performance might suffer if the table is large. However, an index on id, data would help.

edited Apr 15 '15 at 19:10

answered Apr 15 '15 at 17:51

Gordon Linoff

1,242,037
58
646
786

1

If we only set the avg during an insert, then when the Data column of these records is updated, the avg column will become invalid. Example: if the Data for id:2 changes from 4 to 14, the avg for id:2 will still show 12 instead of 17 ([20+14]/2). Gordon's 2nd solution as a view would allow you to update data in your source table without the risk of invalidating the avg column, while giving you simple access to the avg via the new view. – Erik Blessman Apr 15 '15 at 18:01

Stephan · Answer 2 · 2015-04-16T16:48:24.810

5

Gordon Linoff has it on insert. If you want to do it with a trigger

Trigger Method

IF OBJECT_ID('myTable') IS NOT NULL
    DROP TABLE myTable;

CREATE TABLE myTable(ID INT, Data INT,[avg] INT);
GO

CREATE TRIGGER trg_running_avg ON myTable
INSTEAD OF INSERT
AS
BEGIN
    INSERT INTO myTable
        SELECT ID,Data,AVG(Data) OVER (ORDER BY ID ROWS UNBOUNDED PRECEDING)
        FROM inserted
END

INSERT INTO myTable(ID,Data)
VALUES  (1,20),(2,4),(3,18)

SELECT *
FROM myTable

View method

CREATE VIEW vw_average
AS
SELECT ID,Data,AVG(Data) OVER (ORDER BY ID ROWS UNBOUNDED PRECEDING)
        FROM inserted

Update Pre-Inserted Values with Self-Join

UPDATE myTable
SET avg = running_avg
FROM myTable A
INNER JOIN (SELECT ID,AVG(Data) OVER (ORDER BY ID ROWS UNBOUNDED PRECEDING) running_avg FROM myTable) B
ON A.ID = B.ID

Update Pre-Inserted Values with CTE

WITH CTE_Update
AS 
(
    SELECT  ID,
            [avg] OldAvg,
            AVG(Data) OVER (ORDER BY ID) AS NewAvg
    FROM myTable
) 
UPDATE CTE_Update SET OldAvg = NewAvg

edited Apr 16 '15 at 16:48

answered Apr 15 '15 at 17:59

Stephan

5,891
1
16
24

Your trigger method has a flaw. Try `INSERT myTable VALUES (0,-1000000,DEFAULT)` after you have populated the table. The running average for all rows where `ID>0` is now very wrong. – Anon Apr 15 '15 at 18:24
The trigger is for only inserting new ID. Not inserting values that precede before. Problems like old values and not wanting to update your table every time is why I recommended using a view. – Stephan Apr 15 '15 at 19:11
The new ID being inserted may have a lower value than existing IDs. Your trigger should account for this possibility. – Anon Apr 15 '15 at 20:14
Also, your update statement has an unnecessary self-join. This does the same thing much faster: `WITH t(old,new) AS (SELECT ID,AVG(Data) OVER (ORDER BY ID) FROM myTable) UPDATE t SET old = new` – Anon Apr 16 '15 at 16:04
Actually, that would only update your CTE "t". The self join is necessary because windows functions are only allowed in the SELECT and ORDER BY clauses. You CANNOT use it in the SET clause – Stephan Apr 16 '15 at 16:25
Try it before declaring it doesn't work. You are mistaken as to how CTEs operate. http://sqlfiddle.com/#!6/3e1a6/2 – Anon Apr 16 '15 at 16:38
I'm sorry, it does work. What threw me off is that your script updates the data column instead of the avg column. I'll update my answer. – Stephan Apr 16 '15 at 16:45
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/75420/discussion-between-stephan-and-anon). – Stephan Apr 16 '15 at 16:48

Anon · Accepted Answer · 2015-04-15T18:58:30.737

3

SQL Server <=2008 doesn't have the OVER(ORDER BY ...) clause for aggregate functions.

CREATE TRIGGER trg_running_avg ON myTable
AFTER INSERT, UPDATE, DELETE
AS
BEGIN
  UPDATE old
    SET avg = new_avg
  FROM myTable old
  CROSS APPLY (
    SELECT AVG(Data) AS new_avg FROM myTable WHERE ID <= old.ID
  ) new
  --Skip the full table update. Start from the lowest ID that was changed.
  WHERE id >= (SELECT MIN(id) FROM (SELECT ID FROM inserted UNION ALL SELECT ID FROM deleted) t)
END
GO

Use a view for this if you can. It's a bad design for a change in one row to invalidate data stored in other rows. Rows should represent independent facts.

edited Apr 15 '15 at 18:58

answered Apr 15 '15 at 18:41

Anon

10,660
1
29
31

He's using SQL 2014 according to his comment on his post – Stephan Apr 15 '15 at 19:07
I updated the question to show I was mistaken an am indeed using 2008. Sorry about the misinformation – MRainzo Apr 15 '15 at 19:22

score 0 · Answer 4 · answered May 02 '19 at 04:26

I feel like this should work with a self join:

select t1.id, t1.data, sum(t2.data)/t1.id as avg
from table t1, table t2
where t1.id>=t2.id group by t1.id

join will give:

t1.id|t1.Data|t2.id|t2.Data
1    |  20   |  1  |   20
2    |  4    |  1  |   20
2    |  4    |  2  |   4
3    |  18   |  1  |   20
3    |  18   |  2  |   4
3    |  18   |  3  |   18

Get the running average of a column in T-SQL

4 Answers4

Trigger Method

View method

Update Pre-Inserted Values with Self-Join

Update Pre-Inserted Values with CTE