sql max/min query and data transformation

Question

update: changed one time to show that the times per shipment may not be in sequential order always.

here is my input

create table test
(
    shipment_id int,
    stop_seq tinyint,
    time datetime
)

insert into test values (1,1,'2009-8-10 8:00:00')
insert into test values (1,2,'2009-8-10 9:00:00')
insert into test values (1,3,'2009-8-10 10:00:00')
insert into test values (2,1,'2009-8-10 13:00:00')
insert into test values (2,2,'2009-8-10 14:00:00')
insert into test values (2,3,'2009-8-10 20:00:00')
insert into test values (2,4,'2009-8-10 18:00:00')

the output that i want is below

shipment_id  start    end
-----------  -----    ---
     1        8:00    10:00
     2        13:00   18:00

i need to take the time from the min(stop) row for each shipment and the time from the max(stop) row and place in start/end respectively. i know this can be done with multiple queries rather easily but i am looking to see if a single select query can do this.

thanks!

MyItchyChin · Accepted Answer · 2009-08-10T20:40:29.450

4

I think the only way you'll be able to do it is with sub-queries.

SELECT shipment_id
    , (SELECT TOP 1 time 
        FROM test AS [b] 
        WHERE b.shipment_id = a.shipment_id 
        AND b.stop_seq = MIN(a.stop_seq)) AS [start]
    , (SELECT TOP 1 time 
        FROM test AS [b] 
        WHERE b.shipment_id = a.shipment_id 
        AND b.stop_seq = MAX(a.stop_seq)) AS [end]
FROM test AS [a]
GROUP BY shipment_id

You'll need to use the DATEPART function to chop up the time column to get your exact output.

edited Aug 10 '09 at 20:40

answered Aug 10 '09 at 20:21

MyItchyChin

13,733
1
24
44

the times may not always be in sequential order...i forgot to show that in my example...updated my question to reflect this. – thomas Aug 10 '09 at 20:22
this works. thanks. i knew it could be done with multiple selects but the purist in me was looking for alternatives. – thomas Aug 10 '09 at 20:43
@thomas : If you create a clustered index on shipment_id and stop_seq the impact of doing the sub-queries will be negligible because all of the clauses are sargable and will perform index seeks. – MyItchyChin Aug 10 '09 at 21:19
@thomas: If you have a shipments table, see the addendum to my solution for a more efficient alternative to CptSkippy's solution, but along the same lines. – Steve Kass Aug 10 '09 at 23:25
1

@CptSkippy: Not everyone might realize that your TOP 1 is there to avoid getting an error if the subqueries return more than one row. Personally, I'd leave TOP 1 out for two reasons. First, TOP 1 without ORDER BY is a bad pattern to use in general, because it's nondeterministic, and second, because if the subquery returns more than one row, chances are that there is a flaw in design or logic somewhere that is only being hidden. Just my 2c worth. Your solution is a good one, though, and probably as efficient as any! – Steve Kass Aug 10 '09 at 23:31
@Steve: All valid points and I agree with you to an extent. I think it really boils down to the amount of control you have over the systems you're working with. – MyItchyChin Aug 11 '09 at 01:25

score 1 · Answer 2 · answered Aug 10 '09 at 20:53

Use a Common Table Expression (CTE) - this works (at least on my SQL Server 2008 test system):

WITH SeqMinMax(SeqID, MinID, MaxID) AS
(
    SELECT Shipment_ID, MIN(stop_seq), MAX(stop_seq)
    FROM test
    GROUP BY Shipment_ID
)
SELECT 
    SeqID 'Shipment_ID',
    (SELECT TIME FROM test 
       WHERE shipment_id = smm.seqid AND stop_seq = smm.minid) 'Start',
    (SELECT TIME FROM test 
       WHERE shipment_id = smm.seqid AND stop_seq = smm.maxid) 'End'
FROM seqminmax smm

The SeqMinMax CTE selects the min and max "stop_seq" values for each "shipment_id", and the rest of the query then builds on those values to retrieve the associated times from the table "test".

CTE's are supported on SQL Server 2005 (and are a SQL:2003 standard feature - no Microsoft "invention", really).

Marc

score 0 · Answer 3 · answered Aug 10 '09 at 20:38

0

Am I correct in thinking that you want the first time rather than the 'min' time, and the last time in the sequence rather than the 'max' time?

answered Aug 10 '09 at 20:38

fras

123
1
3
8

that is correct. the time associated with the first stop (i.e. min(stop_seq)) and the time associated with the last stop (i.e. max(stop_seq)) – thomas Aug 10 '09 at 20:40

score 0 · Answer 4 · answered Aug 10 '09 at 20:49

SELECT C.shipment_id, C.start, B2.time AS stop FROM
(    
   SELECT A.shipment_id, B1.time AS start, A.max_stop_seq FROM
   (
      SELECT shipment_id, MIN(stop_seq) as min_stop_seq, MAX(stop_seq) as max_stop_seq 
      FROM test
      GROUP BY shipment_id
   ) AS A

   INNER JOIN 

   (
      SELECT shipment_id, stop_seq, time FROM test
   ) AS B1

   ON A.shipment_id = B1.shipment_id AND A.min_stop_seq = B1.stop_seq
) AS C

INNER JOIN

(
   SELECT shipment_id, stop_seq, time FROM test
) AS B2

ON C.shipment_id = B2.shipment_id AND C.max_stop_seq = B2.stop_seq

score 0 · Answer 5 · answered Aug 10 '09 at 21:29

select t1.shipment_id, t1.time start, t2.time [end]
from (
    select shipment_id, min(stop_seq) min, max(stop_seq) max
    from test
    group by shipment_id
) a
inner join test t1 on a.shipment_id = t1.shipment_id and a.min = t1.stop_seq 
inner join test t2 on a.shipment_id = t2.shipment_id and a.max = t2.stop_seq

Steve Kass · Answer 6 · 2009-08-10T23:26:52.780

I suggest you take advantage of row_number and PIVOT. This may look messy, but I think it will perform well, and it's more adaptable to various assumptions. For example, it doesn't assume that the latest datetime value corresponds to the largest stop_seq value for a given shipment.

with test_ranked(shipment_id,stop_seq,time,rankup,rankdown) as (
  select
    shipment_id, stop_seq, time,
    row_number() over (
      partition by shipment_id
      order by stop_seq
    ),
    row_number() over (
      partition by shipment_id
      order by stop_seq desc
    )
  from test
), test_extreme_times(shipment_id,tag,time) as (
  select
    shipment_id, 'start', time
  from test_ranked where rankup = 1
  union all
  select
    shipment_id, 'end', time
  from test_ranked where rankdown = 1
)
  select
    shipment_id, [start], [end]
  from test_extreme_times
  pivot (max(time) for tag in ([start],[end])) P
  order by shipment_id;
  go

The PIVOT isn't really needed, but it's handy. However, do note that the MAX inside the PIVOT expression doesn't do anything useful. There's only one [time] value for each tag, so MIN would work just as well. The syntax requires an aggregate function in this position.

Addendum: Here's an adaptation of CptSkippy's solution that may be more efficient than using MIN and MAX if you have a shipments table:

SELECT shipment_id
    , (SELECT TOP 1 time 
        FROM test AS [b] 
        WHERE b.shipment_id = a.shipment_id 
        ORDER BY stop_seq ASC) AS [start]
    , (SELECT TOP 1 time 
        FROM test AS [b] 
        WHERE b.shipment_id = a.shipment_id 
        ORDER BY stop_seq DESC) AS [end]
FROM shipments_table AS [a];

thanks steve. this looks a little more complicated than I am looking for given the specific need and the fact that I already have a bunch of CTEs in my stored procedure and don't want to clutter it up entirely :-) I do like the row_number() with partition...i need to research that some more to understand its capabilities better. thanks for you help and suggestions! — thomas, Aug 11 '09 at 02:10

sql max/min query and data transformation

6 Answers6

Linked