0

Suppose I have a table "arrivals" which logs when an item arrives at a location. Each line corresponds to an arrival event.

arrivals
========
LocationId     ItemId     TimeStamp
------------------------------------------
a              1          2015-11-24 11:00  (item 1 arrived at 11:00 at location a)
a              2          2015-11-24 11:01  (item 2 arrived at 11:00 at location a)
b              1          2015-11-24 11:05  (item 1 left location a and arrived at b)
b              2          2015-11-24 11:06  (item 2 left location a and arrived at b)

Is there an SQL statement that can tell me which items where at location A at 11:03, possibly without bringing the SQL server to its knees?

(This would include all items that have arrived at location A before 11:03, and have not arrived anywhere else since then)

There are usually about 30 items per location at the same time, 100 different locations, and about 1000 arrivals per location per hour.

Are there any design choices that would simplify such queries, even if it means denormalising the database?

Currently I cannot even begin to imagine an sql command that could get me this information, without going through the entire table in a custom loop.

HugoRune
  • 13,157
  • 7
  • 69
  • 144

2 Answers2

2

I use DISTINCT in case an item can go to one location and return

This check for every item on location A before '2015-11-24 11:03' But not exists in any other location after '2015-11-24 11:03'

SELECT DISTINCT  ItemId     
FROM arrivals lA   -- location A
WHERE LocationId = 'A'
  AND lA.TimeStamp < '2015-11-24 11:03'
  AND NOT EXISTS (SELECT 1
                  FROM arrivals lX -- any other location
                  WHERE lX.LocationId <> lA.LocationId     
                    AND lX.ItemId = lA.ItemId     
                    AND lX.TimeStamp >= '2015-11-24 11:03'
                 )

NOTE: You may like double check < and >= to match your need

Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • This looks promising, but I am worried about the performance. With 1000 items per location per hour, the table of arrivals will have millions of entries after only 10 days; and this looks like it will require several scans per query. – HugoRune Nov 24 '15 at 14:38
  • You need `CREATE INDEX` for `LocationId`, `ItemId` and `TimeStamp` and maybe a composite index with all three fields. The `EXISTS` is very efficient, if found something using the composite index will return false and move on. – Juan Carlos Oropeza Nov 24 '15 at 14:40
  • First create single INDEX. Check the execution PLAN http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan Then add the composite index and check PLAN again. – Juan Carlos Oropeza Nov 24 '15 at 14:41
  • I dont think is need it. But if you still want to improve perfomance and you can add some other restriction like one item **cant** be in a location for more than one day you can use `lA.TimeStamp between '2015-11-23 11:03' AND '2015-11-24 11:03'` to reduce the number of row to compare – Juan Carlos Oropeza Nov 24 '15 at 15:02
1

If you want to know about all the items that arrived at that location at the given time or earlier, this should do the trick:

SELECT distinct ItemId FROM arrivals WHERE LocationId = 'a' AND TimeStamp <= '2015-11-24 11:03'

EDIT: And taking into account the fact, that something might have left the location after it arrived and still before our timestamp I suggest this one:

SELECT arrivals_prim.itemId FROM
    (SELECT itemId, max("TimeStamp") as last_arrival
        FROM arrivals
        WEHRE 
            "TimeStamp" <= '2015-11-24 11:03' 
            and locationId = 'a'
        GROUP BY itemId) arrivals_prim
    LEFT JOIN (SELECT itemId, max("TimeStamp") as last_departure
        FROM arrivals
        WEHRE
            "TimeStamp" < '2015-11-24 11:03'
            and locationId <> 'a'
        GROUP BY itemId) arrivals_bis
    ON (arrivals_prim.itemId = arrivals_bis.itemId)
    WEHRE last_departure is null or last_departure < last_arrival
KotBehemot
  • 111
  • 8