1

I have two tables, call them "monthlyStoreCount" and "weeklySales"

monthlyStoreCount

date storeCount
2022-01-01 89
2022-02-01 94
... ...

weeklySales

date sales
2021-12-31 66
2022-01-07 16
2022-01-14 147
2022-01-21 185
2022-01-28 145
2022-04-04 2572
... ...

I am looking to join these tables to get the "storeCount" and latest "sales" as of the dates in the monthlyStoreCount table.

Is there any performant way to do this join? With the data shown the desired output would be:

date storeCount sales
2022-01-01 89 66
2022-02-01 94 145
... ... ...
JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
MYK
  • 1,988
  • 7
  • 30
  • What RDBMs? lateral/cross apply would be useful table value functions here to return top most >= date. Limit 1 Example if MS SQL Server https://dba.stackexchange.com/questions/173183/using-cross-apply-with-group-by-and-top-1-with-duplicate-data – xQbert Oct 04 '22 at 14:49
  • Using Snowflake – MYK Oct 04 '22 at 14:57
  • Also, I can solve it using logic as shown here: https://stackoverflow.com/questions/73792180/how-to-get-the-most-recent-event-per-date-in-sql?noredirect=1#comment130310620_73792180 But it feels sub-optimal. – MYK Oct 04 '22 at 14:59
  • https://docs.snowflake.com/en/sql-reference/constructs/join-lateral.html is what you're after I think. It uses coorlation to get the desired values and you can limit for the top 1. So your subquery will join monthly to weekly via lateral and the sub query will execute returning the closest date > the date from monthly. but you limit results to the top 1 record ordered ascending. – xQbert Oct 04 '22 at 15:30
  • @MYK What do you mean by sub-optimal? If you're talking about performance, it's helpful to include the execution plan and other details such as --how much data is being moved around, size of the warehouse, execution time, any disk spills etc. – Radagast Oct 04 '22 at 17:32

2 Answers2

1

UNTESTED:

Using: https://docs.snowflake.com/en/sql-reference/constructs/join-lateral.html as a primer...

"for each row in left_hand_table LHT: execute right_hand_subquery RHS using the values from the current row in the LHT"

Lateral allows us to execute the sub query for each record in the Monthly Store Count. So we get the MSC record whose date is >= ws date. ordered by weekly sales date descending and get the 1st record (the one closest to the monthly store count date which is equal to or before that date.)

SELECT MSC.Date, MSC.StoreCount,  sWS.Sales
FROM monthlyStoreCount as MSC, 
     LATERAL (SELECT WS.Sales 
              FROM WeeklySales as WS 
              WHERE MSC.date>= WS.date 
              ORDER BY WS.Date DESC LIMIT 1) as sWS
ORDER BY MSC.Date ASC;
xQbert
  • 34,733
  • 2
  • 41
  • 62
  • I don't know about anyone else but this line in the doc link, "Just as INNER JOIN syntax can use either the comma or the words “INNER JOIN”, a lateral join can also use the comma or the words INNER JOIN." scares me. Does this imply a , is the same as Inner join in Snowflake? my poor trusted ANSI standards. – xQbert Oct 04 '22 at 18:57
  • 1
    I tested it for you on Snowflake. Throws `SQL compilation error: Unsupported subquery type cannot be evaluated` – Radagast Oct 04 '22 at 19:48
  • 1
    https://stackoverflow.com/questions/58655268/getting-error-sql-compilation-error-unsupported-subquery-type-cannot-be-evalua ugh... I'll go for phil's response then :P – xQbert Oct 04 '22 at 21:42
1

Instead of using a cartesian product, what if you stack them up and look for the date that occurs right before the date for monthly store counts?

with cte as
(select date, storeCount, 1 as is_monthly
 from monthlyStoreCount 
 union all
 select date, sales, 0 as is_monthly
 from weeklySales)
  

select *, lag(storeCount) over (order by date asc, is_monthly asc)
from cte
qualify is_monthly=1;

Hmm....It appears there is one way to make xQbert's lateral join solution work. By slapping an aggregate on it. I don't know why Snowflake doesn't allow the same using limit/top 1.

select *
from monthlyStoreCount as m,
     lateral (select array_agg(w.sales) within group(order by w.date desc)[0] as sales
              from WeeklySales as w 
              where m.date>= w.date)
Radagast
  • 5,102
  • 3
  • 12
  • 27
  • Note: I didn't do this in my solution, but since we're stacking two different tables, it's a good idea to use sensible columns names in the post-union all table/cte instead of inferring it based on the order of tables – Radagast Oct 04 '22 at 18:20
  • I could be wrong but my understanding of a cross apply/ lateral is that it doesn't behave like a join. it's a table value functional. As such, it wouldn't result in a Cartesian. Plus, as the inline view and coorlation are executed once for each subquery and quit after the "TOP MOST" elment is found; it can be rahter quick. But this is based on SQL Server, Oracle, and MySQL,postgresql engines. I'm not sure of snowlfake. so as such, testing is required. – xQbert Oct 04 '22 at 18:55
  • 1
    @xQbert Sorry, I was referring to the solution linked by OP in the comments. I do think lateral join is a better way to go about this assuming Snowflake supports the solution you posted (they are little quirky with lateral joins) – Radagast Oct 04 '22 at 19:02
  • 1
    I read more into it than I should I'm not blameless :P I went back to read the doc to see how lateral is implemented witin the Snowflake engine only to see this text: " lateral join can also use the comma or the words INNER JOIN" holy .... that scares me Still upvoted this as it's a viable. option and creative. – xQbert Oct 04 '22 at 19:21