4

Okay so I have a table and in one column I have some data and the second column the average of the data. Example

id|Data|avg
1 |20  |20
2 |4   |12
3 |18  |14

How do I populate the avg column on insert with the running average of the Data column using T-SQL?

EDIT: Sorry guys, this was actually a stupid mistake I made. I assumed I had SQL 2014 but after trying Stephan's code and getting some errors, I went back to confirm and realize I use SQL 2008. Sorry for the misinformation. I have also updated the tags

MRainzo
  • 3,800
  • 2
  • 16
  • 25
  • @Stephan, there is nothing that says that the `avg` column is a calculated column in the sql server sense. Just that OP is trying to calculate it upon insert. – crthompson Apr 15 '15 at 17:47
  • 2
    Could you please tell us which version of SQL server you are using? There are ways to do this. – Matt Apr 15 '15 at 17:47
  • Not sure how to do it on insert (not sure that it is a great idea from an I/O perspective if your insert is blocking other queries) but this question (specifically the 2nd part of the accepted answer) is an elegant solution to the problem you are trying to address: http://stackoverflow.com/questions/26618353/t-sql-calculate-moving-average – JP. Apr 15 '15 at 17:53

4 Answers4

6

On insert, assuming id is an identity and you are just putting in data:

insert into table t(id, data, avg)
    select @data, @data * (1.0 / n) + avg * (n - 1.0)/n
    from (select count(*) as cnt, avg(data) as avg
          from t
         ) t;

In SQL Server 2012+, it is easy enough just to get it on output:

select t.*, avg(data) over (order by id) as cume_avg
from table t

Prior to SQL Server 2012, you would do this with a correlated subquery or apply:

select t.*,
       (select avg(data)
        from table t2
        where t2.id <= t.id
       ) as cume_avg 
from table t;

Here performance might suffer if the table is large. However, an index on id, data would help.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    If we only set the avg during an insert, then when the Data column of these records is updated, the avg column will become invalid. Example: if the Data for id:2 changes from 4 to 14, the avg for id:2 will still show 12 instead of 17 ([20+14]/2). Gordon's 2nd solution as a view would allow you to update data in your source table without the risk of invalidating the avg column, while giving you simple access to the avg via the new view. – Erik Blessman Apr 15 '15 at 18:01
5

Gordon Linoff has it on insert. If you want to do it with a trigger

Trigger Method

IF OBJECT_ID('myTable') IS NOT NULL
    DROP TABLE myTable;

CREATE TABLE myTable(ID INT, Data INT,[avg] INT);
GO

CREATE TRIGGER trg_running_avg ON myTable
INSTEAD OF INSERT
AS
BEGIN
    INSERT INTO myTable
        SELECT ID,Data,AVG(Data) OVER (ORDER BY ID ROWS UNBOUNDED PRECEDING)
        FROM inserted
END

INSERT INTO myTable(ID,Data)
VALUES  (1,20),(2,4),(3,18)

SELECT *
FROM myTable

View method

CREATE VIEW vw_average
AS
SELECT ID,Data,AVG(Data) OVER (ORDER BY ID ROWS UNBOUNDED PRECEDING)
        FROM inserted

Update Pre-Inserted Values with Self-Join

UPDATE myTable
SET avg = running_avg
FROM myTable A
INNER JOIN (SELECT ID,AVG(Data) OVER (ORDER BY ID ROWS UNBOUNDED PRECEDING) running_avg FROM myTable) B
ON A.ID = B.ID

Update Pre-Inserted Values with CTE

WITH CTE_Update
AS 
(
    SELECT  ID,
            [avg] OldAvg,
            AVG(Data) OVER (ORDER BY ID) AS NewAvg
    FROM myTable
) 
UPDATE CTE_Update SET OldAvg = NewAvg
Stephan
  • 5,891
  • 1
  • 16
  • 24
  • Your trigger method has a flaw. Try `INSERT myTable VALUES (0,-1000000,DEFAULT)` after you have populated the table. The running average for all rows where `ID>0` is now very wrong. – Anon Apr 15 '15 at 18:24
  • The trigger is for only inserting new ID. Not inserting values that precede before. Problems like old values and not wanting to update your table every time is why I recommended using a view. – Stephan Apr 15 '15 at 19:11
  • The new ID being inserted may have a lower value than existing IDs. Your trigger should account for this possibility. – Anon Apr 15 '15 at 20:14
  • Also, your update statement has an unnecessary self-join. This does the same thing much faster: `WITH t(old,new) AS (SELECT ID,AVG(Data) OVER (ORDER BY ID) FROM myTable) UPDATE t SET old = new` – Anon Apr 16 '15 at 16:04
  • Actually, that would only update your CTE "t". The self join is necessary because windows functions are only allowed in the SELECT and ORDER BY clauses. You CANNOT use it in the SET clause – Stephan Apr 16 '15 at 16:25
  • Try it before declaring it doesn't work. You are mistaken as to how CTEs operate. http://sqlfiddle.com/#!6/3e1a6/2 – Anon Apr 16 '15 at 16:38
  • I'm sorry, it does work. What threw me off is that your script updates the data column instead of the avg column. I'll update my answer. – Stephan Apr 16 '15 at 16:45
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/75420/discussion-between-stephan-and-anon). – Stephan Apr 16 '15 at 16:48
3

SQL Server <=2008 doesn't have the OVER(ORDER BY ...) clause for aggregate functions.

CREATE TRIGGER trg_running_avg ON myTable
AFTER INSERT, UPDATE, DELETE
AS
BEGIN
  UPDATE old
    SET avg = new_avg
  FROM myTable old
  CROSS APPLY (
    SELECT AVG(Data) AS new_avg FROM myTable WHERE ID <= old.ID
  ) new
  --Skip the full table update. Start from the lowest ID that was changed.
  WHERE id >= (SELECT MIN(id) FROM (SELECT ID FROM inserted UNION ALL SELECT ID FROM deleted) t)
END
GO

Use a view for this if you can. It's a bad design for a change in one row to invalidate data stored in other rows. Rows should represent independent facts.

Anon
  • 10,660
  • 1
  • 29
  • 31
0

I feel like this should work with a self join:

select t1.id, t1.data, sum(t2.data)/t1.id as avg
from table t1, table t2
where t1.id>=t2.id group by t1.id

join will give:

t1.id|t1.Data|t2.id|t2.Data
1    |  20   |  1  |   20
2    |  4    |  1  |   20
2    |  4    |  2  |   4
3    |  18   |  1  |   20
3    |  18   |  2  |   4
3    |  18   |  3  |   18
aaa11
  • 1