I have a datalake on AWS, queried using Athena, with the following structure and sample data
Key | Date | Value
----+---------------+-------
a | 01/01/2020 | 4.5
a | 05/01/2020 | 6
a | 06/01/2020 | 3.2
b | 01/01/2020 | 2.4
b | 03/01/2020 | 5
I would like to run a query to extract values
for a specific date
and for each key
. If date is not a known one, like 99% of the time, value should be returned as a linear interpolation of the two closest one.
Dates
are here reported in dd/mm/YYYY format for simplicity, but in the datalake are stored as timestamps.
Example of result
If I want to get the values
for the 2nd of January (02/01/2020), the expected output is
Key | Date | Value
----+---------------+-------
a | 02/01/2020 | 4.875
b | 02/01/2020 | 3.70
Where 4.875 is the linear interpolation between 4.5 (value at 01/01/2020) and 6 (value at 05/01/2020). I have manually evaluated it as (y - 4.5) / (2 - 1) = (6 - 4.5) / (5 - 1)
(see linear interpolation for more reference).
Same for 3.7
How can I achieve that (if possible) with one single query?
Assumption: we always have a smaller and bigger date from the point we're searching for.
Update - Athena, based on PrestoDB, does not support JOIN LATERAL
, so that is not an option I can consider