Aggregate and count disctinct values with a join

Question

This question is a mix of those duplicates:

I just can't manage to make it work all at once.

I have two tables :

TABLE A
IntervalId    Starts  Ends
-------------------------
1              0      10
2             10      25
3             25      32
4             32      40

TABLE B
Id            ErrorType    Starts    Ends
----------------------------------------
1                   666       0      25
2                   666       10     32
3                   777       0      32
4                   666       25     40
5                   777       10     25

Starting from the time intervals in table B, I'm trying to count and list, in each interval, the error types that might have happened during that interval. And remove duplicates.

Please note that there isn't any Start or End in table B that does not exist in Table A (Table A is generated from them).

the result with duplicates would be this :

Starts      Ends        ErrorsCount      Errors
-----------------------------------------------
0           10          2                 666, 777
10          25          4                 666, 666, 777, 777
25          32          3                 666, 777, 666
32          40          1                 666

The result I'm looking for is without duplicates:

Starts      Ends        DistinctErrorsCnt  DistinctErrors
-----------------------------------------------
0           10          2                 666, 777
10          25          2                 666, 777
25          32          2                 666, 777
32          40          1                 666

Here is my attemot, but I can't understand how to get ErrorType out of the bit that does the "distinct" without SQL server complaining that it's not in an aggregate or group by. Or, as soon as I put it into a Group by, then all the different Error Types are wiped by the first one that comes around. I end up with only 666 everywhere.

SELECT 
    IntervalId,
    Starts,
    Ends,
    COUNT([TableB].ErrorType) as DistinctErrorsCnt,
    DistinctErrors= STRING_AGG([TableB].ErrorType, ',')
FROM
(
    SELECT DISTINCT  
        [TableA].IntervalId,

    FROM TableB LEFT JOIN TableA ON
        (
            [TableA].Starts= [TableB].Starts
        OR [TableA].Ends = [TableB].Ends
        OR ([TableA].Starts >= [TableB].Starts AND [TableA].Ends <= [TableB].Ends)

        )
    GROUP BY 
        [TableA].IntervalId,
        [TableA].Starts, 
        [TableA].Ends, 
) NoDuplicates
GROUP BY 
    NoDuplicates.IntervalId,
    NoDuplicates.Starts, 
    NoDuplicates.Starts

Again: This is not syntactically correct, for the reason I explained above.

STRING_AGG(DISTINCT...) does not exist in SQL server – jeancallisti Mar 17 '20 at 19:27 — jeancallisti, Mar 17 '20 at 19:27

score 0 · Accepted Answer · answered Mar 17 '20 at 19:01

You can use aggregation:

select 
    a.starts,
    a.ends,
    count(distinct b.errorType) DistinctErrorsCnt,
    string_agg(b.errorType, ', ') within group(order by b.starts) DistinctErrors
from tablea a
inner join tableb b on b.starts >= a.ends and b.ends <= a.start
group by a.intervalId, a.start, a.end

If you want to avoid duplicates, you could use a subquery, or better yet, cross apply:

select
    a.starts,
    a.ends,
    count(*) DistinctErrorsCnt,
    string_agg(b.errorType, ', ') within group(order by b.starts) DistinctErrors
from tablea a
cross apply (
    select distinct errorType from tableb b where b.starts >= a.ends and b.ends <= a.start
) b
group by a.intervalId, a.start, a.end

Despite the alias b being used twice and b.starts requiring to be replaced with a.starts on the line with the string_agg, this works. Thanks! — jeancallisti, Mar 17 '20 at 19:28

Aggregate and count disctinct values with a join

1 Answers1