0

I have a DB2 table that has the following columns I need to return in the query

ENTITY             START_TIME          END_TIME           NUMBER        BYTES
SERVER1SQL    2020-03-29 23:03:04   2020-03-30 01:52:37   761102     72212891243
SERVER1       2020-03-31 00:00:30   2020-03-31 01:33:40   765443      4532123432
SERVER2       2020-03-31 01:00:10   2020-03-31 01:40:12   765831     19531321013
SERVER19      2020-03-31 00:20:30   2020-03-31 02:10:40   765955      5212347991
SERVER7       2020-03-31 02:00:29   2020-03-31 02:33:10   766121      2321956753
SERVER1SQL    2020-03-30 23:00:50   2020-03-31 03:40:18   764892     72212891243
SERVER11      2020-03-31 03:06:02   2020-03-31 04:05:40   766337       688174321

I need to find the record for each entity with the max(bytes) but need to return all the columns. My problem is I have DB entities that will backup the same amount day after day and if I want to go back 30 days and find the max bytes that specific entity returns multiple records. The query only seems to work when I only include ENTITY and MAX(BYTES) without the date and number (i.e. SESSION ID). So how do I get the MAX results without duplicates. If there is a duplicate I want the most recent record. Here is what I currently have the mostly works minus the DB entities that return duplicates. I've searched all over and this kind of query seems to be problematic.

select
 varchar(max_sum.entity,45) as entity,
 translate('abcde fg:hi:jk', a.start_time, '_____abcde_fg_hi_jk_____',' ') AS start_time,
 translate('abcde fg:hi:jk', a.end_time, '_____abcde_fg_hi_jk_____',' ') AS end_time,
 a.number,
 max_sum.max_MB
from (select varchar(entity,45) as entity, MAX(bytes/1024/1024) as max_MB
       from summary
            where activity='BACKUP'
            and
            start_time>=(current_timestamp 7 days)
            and
            entity is not NULL
            group by entity) as max_sum, summary a
 where a.entity=max_sum.entity
  and
  (a.bytes/1024/1024)=max_sum.max_MB
   and
   a.activity='BACKUP'
   and
   a.start_time>=(current_timestamp 7 days)
   and
   a.entity is not NULL
  order by max_sum.entity
Chad Small
  • 5
  • 1
  • 3
  • Does this answer your question? [Select first row in each GROUP BY group?](https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group) This appears to be a form of [tag:greatest-n-per-group] query. – Clockwork-Muse Apr 03 '20 at 20:38

1 Answers1

0

I don't understand the relationship between your query and the data. But for what you are asking, use row_number() or rank():

select s.*
from (select s.*,
             row_number() over (partition by entity order by bytes desc, start_time desc) as seqnum
      from summary s
      where . . .
     ) s
where seqnum = 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • The table is basically a log of all backups and we want to find the largest backup during the month for each server and need the date and session # along with the amount that was backed up. My assumption is that they want to see if certain servers do a monthly job on a specific date each month that the server owner has not informed us of. This data might help us adjust backup start times so they don't run long when those jobs happen. I can dump the table to excel and easily do this in excel but I would think DB2 should be able to do it just as easily. – Chad Small Apr 03 '20 at 16:48
  • @GordonLinoff You should add `, START_TIME desc` at the end of `over` to address the requirement: `If there is a duplicate I want the most recent record`. – Mark Barinstein Apr 03 '20 at 16:59
  • That appears to be the answer with a little modification. The table I gave you is a subset of columns from a larger table. I should have clarified that...my fault. Once I took your script and ran it I identified how to return just the columns I needed and was able to return the data I needed. – Chad Small Apr 03 '20 at 19:38