How to aggregate values in two columns in multiple records into one

Question

I am on MS SQL Server.

I have a few records like this:

id   |   firstname     | startpoint   | endpoint   |    lastname     
---------------------------------------------------------------------
1    |   james         |  A           |   B        |    bond         
2    |   james         |  B           |   C        |    bond         
3    |   james         |  X           |   Z        |    bond

the only difference between these records are startpoint and endpoint

I would like to see something like this:

firstname      | startAndEndPoint    | lastname
-------------------------------------------------
james          | A, B, C, X, Z       | bond

I've tried select concat(startpoint, ', ', endpoint) as startAndEndPoint from table where lastname = 'bond' but it still gives me the result like this:

firstname     | startAndEndPoint    | lastname
james         | A, B                | bond
james         | B, C                | bond
james         | X, Z                | bond

I believe array_agg() function can help me according to this post but it appears that function is available in postgres, not sql server.

What if Mr. Bond had a third row with `X` and `Z`... Would that constitute a separate rows in the result? — The Impaler, Jul 10 '20 at 18:12
@TheImpaler the only difference between these two rows is the id, startpoint and endpoint. every other column is the same. If Mr. Bond had a third row with X, and Z, the startAndEndPoint would look like A,B,C,X,Z. Updated the question to reflect. — surprised_ferret, Jul 10 '20 at 18:13

Gordon Linoff · Accepted Answer · 2020-07-10T18:24:53.360

You can unpivot and aggregate:

select firstname, lastname, string_agg(pt, ', ') as points
from (select t.*, v.pt,
             row_number() over (partition by firstname, lastname, pt order by pt) as seqnum
      from t cross apply
           (values (t.startpoint), (t.endpoint)) as v(pt)
     ) t
where seqnum = 1
group by firstname, lastname;

Unfortunately, string_agg() doesn't support distinct. However, this is easily remedied by using row_number().

Edit:

If you wanted to identify each separate connected component, then you can use a recursive CTE:

with cte as (
      select id, firstname, lastname,
             convert(varchar(max), concat(startpoint, ', ', endpoint)) as points,
             endpoint
      from t
      where not exists (select 1 from t t2 where t2.endpoint = t.startpoint)
      union all
      select cte.id, cte.firstname, cte.lastname,
             concat(cte.point, ', ', cte.endpoint), t.endpoint
      from cte join
           t
           on t.startpoint = cte.endpoint and t.id = cte.id
    )
select *
from cte;

Here is a db<>fiddle.

dang it, i made a bad assumption -- one other (update_datetime) column values are different. :( your query is probably returning an empty result, maybe due to the update_datetime being different — surprised_ferret, Jul 10 '20 at 18:24
@thetinybeaker . . . The query works fine and answers the question that you asked. I have included a db<>fiddle illustrating this. I would suggest that you ask a *new* question if you need to modify the problem. — Gordon Linoff, Jul 10 '20 at 18:26
Ah, I just fiddled with the db fiddle, it still works!!! Thank you so much. — surprised_ferret, Jul 10 '20 at 18:33

How to aggregate values in two columns in multiple records into one

1 Answers1