0

I have a SQL table like this showing queries users have run and the start time and end times of the queries.

+-----------------+----------+------------+----------+
| QUERY_TEXT      | USER_NAME| START_TIME | END_TIME |
+-----------------+----------+------------+----------+
| SELECT * FROM ..| analyst  | 03:02:22   | 03:05:35 |
| USE ..          | engineer | 03:03:22   | 03:03:24 |
| SELECT NAME ..  | engineer | 03:03:24   | 03:04:28 |
| UPDATE ..       | analyst  | 03:05:35   | 03:06:39 |
| SHOW ..         | admin    | 03:07:01   | 03:07:05 |
| INSERT ..       | analyst  | 03:06:39   | 03:08:22 |
| COMMIT;         | analyst  | 03:08:22   | 03:10:01 |
| DESCRIBE ..     | analyst  | 03:13:44   | 03:13:46 |
+-----------------+----------+------------+----------+

I want to combine the queries that are time consecutive to get the full scripts run by users. The end result should look like this

+------------------+----------+------------+----------+
| QUERY_TEXT       | USER_NAME| START_TIME | END_TIME |
+------------------+----------+------------+----------+
| SELECT .. COMMIT;| analyst  | 03:02:22   | 03:10:01 |
| USE ..           | engineer | 03:03:22   | 03:04:28 |
| SHOW ..          | admin    | 03:07:01   | 03:07:05 |
| DESCRIBE ..      | analyst  | 03:13:44   | 03:13:46 |
+------------------+----------+------------+----------+

I have found answers combining time consecutive states but not one which will concatenate the values of a column as well

musician1023
  • 55
  • 1
  • 7
  • 1
    Tag our question with the database you are using. And, unless users only one run query, you should show data where a user has multiple scripts that you want on separate rows. – Gordon Linoff Nov 25 '19 at 15:41
  • Just use LISTAGG or STRING_AGG or whatever function does this in on your platform instead of MIN or MAX that is how you concatinate across groups – Hogan Nov 25 '19 at 15:50

2 Answers2

1

This is island and gap problem and you will need to use analytical function as following:

Select listagg(query_text,';') within group (order by start_time) as query_text, 
       username, 
       min(start_time) as start_time, 
       max(end_time) as end_time
  From
  (Select t.*,
          Row_number() over (partition by username order by start_time) 
           - sum(case when start_time < prev_end_time or prev_end_time is null then 1 end) over (partition by username order by start_time) as grp
     From (select t.*, 
                  Lag(end_time) over (partition by username order by start_time) as prev_end_time
             From your_table t
          ) t
   )
Group by username, grp
Order by start_time;

Cheers!!

Popeye
  • 35,427
  • 4
  • 10
  • 31
  • Exactly what I was looking for. In my actual data there were milliseconds of difference between start_time and prev_end_time so I just made one small edit by changing the "case when" condition to "case when ABS(TIMESTAMPDIFF(MILLISECOND, start_time, prev_end_time)) < 350 or prev_end_time is null then 1 end". – musician1023 Nov 25 '19 at 19:38
0

Use the STRING_AGG() function. It takes 2 args- column name and delimiter.

Example using your columns:

select string_agg(query_text, ',') query_text, username, min(start_time), max(end_time)

...

group by user_name

Bar
  • 56
  • 4
  • Hi Bar welcome to stack overflow. For a question like this a good answer would include the actual SQL that is needed to concatinate the columns requested. This is much more like my comment (where I suggest using the function.) – Hogan Nov 25 '19 at 15:52