This is a deceptively expensive query; timeseries analysis is always hard in SQL-like environments. The PARTITION BY
clause you have written requires all of the data for a single extid to be present in memory on a single machine, which is overloading it and causing your resources exceeded error.
You can mitigate this RAM requirement by having a ROWS
clause to limit the scope of your partition. Here is an example:
SELECT extid, stamp, ds, amount, note, balance
FROM (
SELECT
extid, stamp, ds, amount, note, balance,
MAX(tenth_stamp) OVER(PARTITION BY extid) AS target_stamp
FROM (
SELECT extid, stamp, ds, amount, note, balance,
MIN(stamp) OVER (
PARTITION BY extid
ORDER BY stamp DESC
ROWS BETWEEN CURRENT ROW AND 9 FOLLOWING
) AS tenth_stamp
FROM
[monte.ledger2_trailing_21d]
WHERE ds >= '2015-02-09'
)
)
WHERE stamp >= target_stamp
ORDER BY extid, stamp
LIMIT 300
The inner-most sub-select extracts your data and a field tenth_stamp
that holds the least stamp of the 10 rows examined. Using MIN()
makes this work even when there are fewer than 10 rows for any given extid
.
The middle sub-select finds the largest tenth_stamp
for each extid
. This is the tenth total stamp for that extid
. The outer SELECT can then restrict the result to only rows with a stamp
within the ten most recent stamp
for their respective extid
, giving you the desired result.
When executed, this takes a total of 4 stages. It will not run fast, but never requires large amounts of data in a single location. Hope that helps!