7

I need to sum points on each level earned by a tree of users. Level 1 is the sum of users' points of the users 1 level below the user. Level 2 is the Level 1 points of the users 2 levels below the user, etc...

The calculation happens once a month on a non production server, no worries about performance.

What would the SQL look like to do it?

If you're confused, don't worry, I am as well!

User table:

ID    ParentID    Points
1     0           230
2     1           150
3     0           80
4     1           110
5     4           54
6     4           342

Tree:
0
|---\
1    3
| \
2  4---
    \  \
     5  6

Output should be:

ID    Points    Level1     Level2
1     230       150+110    150+110+54+342
2     150
3     80
4     110       54+342
5     54
6     342

SQL Server Syntax and functions preferably...

Jrgns
  • 24,699
  • 18
  • 71
  • 77

9 Answers9

2

If you were using Oracle DBMS that would be pretty straightforward since Oracle supports tree queries with the CONNECT BY/STARTS WITH syntax. For SQL Server I think you might find Common Table Expressions useful

Manrico Corazzi
  • 11,299
  • 10
  • 48
  • 62
2

Trees don't work well with SQL. If you have very (very very) few write accesses, you could change the tree implementation to use nested sets, that would make this query incredibly easy.

Example (if I'm not mistaken):

SELECT SUM(points) 
FROM users 
where left > x and right < y 

However, any changes on the tree require touching a massive amount of rows. It's probably better to just do the recursion in you client.

C B
  • 1,677
  • 6
  • 18
  • 20
Matthias Winkelmann
  • 15,870
  • 7
  • 64
  • 76
1

I would say: create a stored procedure, probably has the best performance. Or if you have a maximum number of levels, you could create subqueries, but they will have a very poort performance.

(Or you could get MS SQL Server 2008 and get the new hierarchy functions... ;) )

Grad van Horck
  • 4,488
  • 1
  • 21
  • 22
1

If you are working with trees stored in a relational database, I'd suggest looking at "nested set" or "modified preorder tree traversal". The SQL will be as simple as that:

SELECT id, 
       SUM(value) AS value 
FROM table 
WHERE left>left\_value\_of\_your\_node 
  AND right<$right\_value\_of\_your\_node;

... and do this for every node you are interested in.

Maybe this will help you: http://www.dbazine.com/oracle/or-articles/tropashko4 or use google.

C B
  • 1,677
  • 6
  • 18
  • 20
Matthias Kestenholz
  • 3,300
  • 1
  • 21
  • 26
1

SQL in general, like others said, does not handle well such relations. Typically, a surrogate 'relations' table is needed (id, parent_id, unique key on (id, parent_id)), where:

  • every time you add a record in 'table', you:

    INSERT INTO relations (id, parent_id) VALUES ([current_id], [current_id]);

    INSERT INTO relations (id, parent_id) VALUES ([current_id], [current_parent_id]);

    INSERT INTO relations (id, parent_id) SELECT [current_id], parent_id FROM relations WHERE id = [current_parent_id];

  • have logic to avoid cycles

  • make sure that updates, deletions on 'relations' are handled with stored procedures

Given that table, you want:

SELECT rel.parent_id, SUM(tbl.points)
FROM table tbl INNER JOIN relations rel ON tbl.id=rel.id
WHERE rel.parent_id <> 0
GROUP BY rel.parent_id;
tzot
  • 92,761
  • 29
  • 141
  • 204
1

Ok, this gives you the results you are looking for, but there are no guarantees that I didn't miss something. Consider it a starting point. I used SQL 2005 to do this, SQL 2000 does not support CTE's

WITH Parent (id, GrandParentId, parentId, Points, Level1Points, Level2Points)
AS
(
    -- Find root
    SELECT id,  
            0 AS GrandParentId,
            ParentId,
            Points,
            0 AS Level1Points,
            0 AS Level2Points
    FROM tblPoints ptr
    WHERE ptr.ParentId = 0

    UNION ALL (
    -- Level2 Points
    SELECT pa.GrandParentId AS Id,
            NULL AS GrandParentId,
            NULL AS ParentId,
            0 AS Points, 
            0 AS Level1Points,
            pa.Points  AS Level2Points
    FROM tblPoints pt
            JOIN Parent pa ON pa.GrandParentId = pt.Id 
    UNION  ALL
    -- Level1 Points
    SELECT pt.ParentId AS Id,
            NULL AS GrandParentId,
            NULL AS ParentId,
            0 AS Points, 
            pt.Points AS Level1Points,
            0 AS Level2Points
    FROM tblPoints pt
            JOIN Parent pa ON pa.Id = pt.ParentId AND pa.ParentId IS NOT NULL 
    UNION  ALL
    -- Points
    SELECT pt.id,
            pa.ParentId AS GrandParentId,
            pt.ParentId,
            pt.Points, 
            0 AS Level1Points,
            0 AS Level2Points
    FROM tblPoints pt
            JOIN Parent pa ON pa.Id = pt.ParentId AND pa.ParentId IS NOT NULL )
)
SELECT id, 
    SUM(Points) AS Points,  
    SUM(Level1Points) AS Level1Points,
    CASE WHEN SUM(Level2Points) > 0 THEN  SUM(Level1Points) + SUM(Level2Points) ELSE 0 END AS Level2Points
FROM Parent
GROUP BY id 
ORDER by id
Darrel Miller
  • 139,164
  • 32
  • 194
  • 243
0

The following table:

Id   ParentId
1   NULL
11    1
12    1
110 11
111 11
112 11
120 12
121 12
122 12
123 12
124 12

And the following Amount table:

Id     Val
110 500
111 50
112 5
120 3000
121 30000
122 300000

Only the leaves (last level) Id's have a value defined. The SQL query to get the data looks like:

;WITH Data (Id, Val) AS
(
    select t.Id, SUM(v.val) as Val from dbo.TestTable t
    join dbo.Amount v on t.Id = v.Id
    group by t.Id
)

select cd.Id, ISNULL(SUM(cd.Val), 0) as Amount FROM
(
    -- level 3
    select t.Id, d.val from TestTable t
    left join Data d on d.id = t.Id

    UNION

    -- level 2
    select t.parentId as Id, sum(y.Val) from TestTable t
    left join Data y on y.id = t.Id
    where t.parentId is not null
    group by t.parentId

    UNION

    -- level 1
    select t.parentId as Id, sum(y.Val) from TestTable t
    join TestTable c on c.parentId = t.Id
    left join Data y on y.id = c.Id
    where t.parentId is not null
    group by t.parentId
) AS cd
group by id

this results in the output:

Id     Amount
1     333555
11   555
12   333000
110 500
111 50
112 5
120 3000
121 30000
122 300000
123 0
124 0

I hope this helps.

Stef Heyenrath
  • 9,335
  • 12
  • 66
  • 121
0

You have a couple of options:

  1. Use a cursor and a recursive user-defined function call (it's quite slow)
  2. Create a cache table, update it on INSERT using a trigger (it's the fastest solution but could be problematic if you have lots of updates to the main table)
  3. Do a client-side recursive calculation (preferable if you don't have too many records)
Ilya Kochetov
  • 17,988
  • 6
  • 44
  • 60
0

You can write a simple recursive function to do the job. My MSSQL is a little bit rusty, but it would look like this:

CREATE FUNCTION CALC
(
@node integer,
)
returns 
(
@total integer
)
as
begin
    select @total = (select node_value from yourtable where node_id = @node);

    declare @children table (value integer);
    insert into @children   
    select calc(node_id) from yourtable where parent_id = @node;

    @current = @current + select sum(value) from @children;
    return
end
Milan Babuškov
  • 59,775
  • 49
  • 126
  • 179
  • Ok, what will the function look like? – Jrgns Sep 18 '08 at 10:25
  • I don't have MSSQL instalation here, but it would go something like: getsum(parentNode int) sum = select value where node = parentNode; foreach row in select children from table where parent = parentNode sum = sum + getsum(childnode) You would call it on the top node. – Milan Babuškov Sep 18 '08 at 10:39