2

Okay so I suck at SQL... If I have a table Forecasts in SQL Server (2015+) with columns CityName, WeatherProviderName, TimeStamp, TempCelcius, And what I want out of it is the forecasts for every city from every weather provider that has been reported in the last 2 days, plus the latest forecast for every City+Provider that has not forecast anything in the last 2 days.

Is there a single query to get this that is more performant that just requesting the 2 separately? I know I can get the latest values (based off this article) like this:

select t.CityName, t.WeatherProviderName, t.TimeStamp, t.TempCelcius from Forecasts t
  inner join (
    select CityName, WeatherProviderName, TempCelcius, max(TimeStamp) as maxTime 
    from Forecasts where TimeStamp < DATEADD(DAY,-2, GETDATE())
    group by CityName, WeatherProviderName
    ) tm on t.CityName = tm.CityName AND t.WeatherProviderName = tm.WeatherProviderName AND t.TimeStamp = tm.maxTime

(also tried with the partition from that SO question, but it was more than 3 times slower for my test set)

I can get everything for the last 2 days with:

select CityName, WeatherProviderName, TimeStamp, TempCelcius from Forecasts where TimeStamp > DATEADD(DAY,-2, GETDATE())

but rather than running them both into 2 collections and combining, is there a way to get them both in a quick single query?

Note on Answer

I went with the union as suggested by @Forklift (thanks) in the comment below. It's the fastest of the options suggested. It looked like this:

SELECT t.CityName, t.WeatherProviderName, t.TimeStamp, t.TempCelcius FROM 
Forecasts t
  INNER JOIN (
    SELECT CityName, WeatherProviderName, TempCelcius, max(TimeStamp) AS maxTime 
    FROM Forecasts WHERE TimeStamp < DATEADD(DAY,-2, GETDATE())
    GROUP BY CityName, WeatherProviderName
    ) tm ON t.CityName = tm.CityName AND t.WeatherProviderName = tm.WeatherProviderName AND t.TimeStamp = tm.maxTime
    UNION
    SELECT CityName, WeatherProviderName, TimeStamp, TempCelcius FROM Forecasts WHERE TimeStamp > DATEADD(DAY,-2, GETDATE())

I have also marked @SqlZsm as the answer because it does do it in a single query... so depending on your exact need you have either @Forklift or @SqlZsm to thank :)

Community
  • 1
  • 1
Wizbit
  • 421
  • 1
  • 4
  • 13
  • 1
    You can use `UNION` to merge the two result sets if you're simply interested in making one call to SQL. – Forklift Mar 09 '17 at 18:23
  • True; I suppose that does at least remove the round trip. Was hoping there might be something that didn't run the 2 iterations of the DB. Not hyper critical. As much for my education as my need. – Wizbit Mar 09 '17 at 18:27
  • I think the problem is that they are actually 2 different queries with similar shaped return data. I'm not sure how you would try to get those distinct data sets back by asking SQL only one thing. – Forklift Mar 09 '17 at 18:29

2 Answers2

3

This returns all rows from the last two days, and the most recent row for those that do not have a row within the last two days using a subquery with row_number():

select s.CityName, s.WeatherProviderName, s.TimeStamp, s.TempCelcius 
from (
  select t.CityName, t.WeatherProviderName, t.TimeStamp, t.TempCelcius 
    , rn = row_number() over (
        partition by t.CityName, t.WeatherProviderName
        order by t.TimeStamp desc
        )
  from Forecasts t
  ) as s
where s.TimeStamp > dateadd(day,-2, getdate())
  or rn = 1

rextester demo: http://rextester.com/YQS70477

test setup:

create table Forecasts (
    CityName varchar(32)
  , WeatherProviderName varchar(32)
  , TimeStamp datetime
  , TempCelcius float
)
insert into Forecasts values 
 ('Sierra Leon','CNN','19881230',30)
,('Sierra Leon','CNN','19881231',30)
,('Sierra Leon','BBC','19881231',30)
,('Sierra Leon','BBC',dateadd(day,-2, getdate()),28)
,('Sierra Leon','BBC',dateadd(day,-1, getdate()),29)
,('Sierra Leon','BBC',getdate(),30)

query:

select s.CityName, s.WeatherProviderName, s.TimeStamp, s.TempCelcius 
from (
  select t.CityName, t.WeatherProviderName, t.TimeStamp, t.TempCelcius 
    , rn = row_number() over (
        partition by t.CityName, t.WeatherProviderName
        order by t.TimeStamp desc
        )
  from Forecasts t
  ) as s
where s.TimeStamp > dateadd(day,-2, getdate())
  or rn = 1

returns:

+-------------+---------------------+---------------------+-------------+
|  CityName   | WeatherProviderName |      TimeStamp      | TempCelcius |
+-------------+---------------------+---------------------+-------------+
| Sierra Leon | BBC                 | 09.03.2017 19:49:06 |          30 |
| Sierra Leon | BBC                 | 08.03.2017 19:49:06 |          29 |
| Sierra Leon | CNN                 | 31.12.1988 00:00:00 |          30 |
+-------------+---------------------+---------------------+-------------+
SqlZim
  • 37,248
  • 6
  • 41
  • 59
  • No I am not after this. I am after the latest values only up to 2 days ago (see typo note below), then everything that has happened since then... For example if CNN have not put out a forecast for Sierra Leon since 1988 then I want that one... but I'll take all their NewYork forcasts for the last 2 days only. Sorry typo on the first query... should have been -2 not -5. Doesn't make a huge difference other than confusion; but corrected now... – Wizbit Mar 09 '17 at 18:39
  • @Wizbit removed the `where` in the subquery that was based on that. – SqlZim Mar 09 '17 at 18:42
  • @SqlZsm this would give me the same as my first query with the < changed to a > wouldn't it? And as per comment in the post I found the partition a lot slower than the inner join – Wizbit Mar 09 '17 at 18:45
  • @Wizbit I added a demo, does that make more sense now? Ranking window functions (like `row_number() over()`) and aggregation window functions (like `max() over()`) can have different performance results. I would suggest you test it before saying it will be slower. -- the end result may be slower than the combined time of your previous individual queries, but if you were to `union` them, you will incur the cost of a `sort` operator to remove duplicates. – SqlZim Mar 09 '17 at 18:49
  • @SqlZsm. Neat; I missed the _OR_. So yep single query; but on the 400k row table I'm testing on it comes out slower than the union: Union = CPU time = 612 ms, elapsed time = 79 ms; partition = CPU time = 1994 ms, elapsed time = 187 ms. Still I think those are likely to be the best pair of answers... – Wizbit Mar 09 '17 at 19:02
  • @Wizbit If that's the case I would go with the `union`. – SqlZim Mar 09 '17 at 19:07
  • @SqlZsm to be honest teaching me of the existence of rextester.com is enough get the upvote... thanks for that and the education on better use of partition – Wizbit Mar 10 '17 at 15:16
0

You could try this

select CityName, WeatherProviderName, TimeStamp, TempCelcius from Forecasts where TimeStamp > DATEADD(DAY,-2, GETDATE())
union all
select * from (
    select CityName, WeatherProviderName, TimeStamp, TempCelcius, row_number() over (partition by concat(CityName, WeatherProviderName) order by TimeStamp desc) as rn from Forecasts where TimeStamp < DATEADD(DAY,-2, GETDATE())
)
where rn = 1
T0t3sMcG0t3s
  • 310
  • 1
  • 8
  • This is effectively the union approach mentioned by @Forklift with the partition approach to the latest value that I already mentioned in my original post; which I said was the same as the inner join, but slower by a factor of 3. – Wizbit Mar 10 '17 at 14:58
  • I guess I'm not sure what you are after then. Is this still an open issue then? If @SqlZim has the best answer, you should mark it as the answer. – T0t3sMcG0t3s Mar 10 '17 at 15:10