1

I've found a similar question (Duplicating records to fill gap between dates in Google BigQuery), however with a different scenario and the answer there does not apply.

I have data structured like so (which is basically price-change history for multiple products and partners):

+------------+---------+---------+-------+
|    date    | product | partner | value |
+------------+---------+---------+-------+
| 2017-01-01 | a       | x       |    10 |
| 2017-01-01 | b       | x       |    15 |
| 2017-01-01 | a       | y       |    11 |
| 2017-01-01 | b       | y       |    16 |
| 2017-01-05 | b       | x       |    13 |
| 2017-01-07 | a       | y       |    15 |
| 2017-01-07 | a       | x       |    15 |
+------------+---------+---------+-------+

What I need is a query (specifically written in BigQuery Standard SQL) that, given a date range (in this case, 2017-01-01 to 2017-01-10), outputs the following result:

+--------------+---------+---------+-------+
|      date    | product | partner | value |
+--------------+---------+---------+-------+
|   2017-01-01 | a       | x       |    10 |
|   2017-01-02 | a       | x       |    10 |
|   2017-01-03 | a       | x       |    10 |
|   2017-01-04 | a       | x       |    10 |
|   2017-01-05 | a       | x       |    10 |
|   2017-01-06 | a       | x       |    10 |
|   2017-01-07 | a       | x       |    15 |
|   2017-01-08 | a       | x       |    15 |
|   2017-01-09 | a       | x       |    15 |
|   2017-01-10 | a       | x       |    15 |
|   2017-01-01 | a       | y       |    11 |
|   2017-01-02 | a       | y       |    11 |
|   2017-01-03 | a       | y       |    11 |
|   2017-01-04 | a       | y       |    11 |
|   2017-01-05 | a       | y       |    11 |
|   2017-01-06 | a       | y       |    11 |
|   2017-01-07 | a       | y       |    15 |
|   2017-01-08 | a       | y       |    15 |
|   2017-01-09 | a       | y       |    15 |
|   2017-01-10 | a       | y       |    15 |
|   2017-01-01 | b       | x       |    15 |
|   2017-01-02 | b       | x       |    15 |
|   2017-01-03 | b       | x       |    15 |
|   2017-01-04 | b       | x       |    15 |
|   2017-01-05 | b       | x       |    13 |
|   2017-01-06 | b       | x       |    13 |
|   2017-01-07 | b       | x       |    13 |
|   2017-01-08 | b       | x       |    13 |
|   2017-01-09 | b       | x       |    13 |
|   2017-01-10 | b       | x       |    13 |
|   2017-01-01 | b       | y       |    16 |
|   2017-01-02 | b       | y       |    16 |
|   2017-01-03 | b       | y       |    16 |
|   2017-01-04 | b       | y       |    16 |
|   2017-01-05 | b       | y       |    16 |
|   2017-01-06 | b       | y       |    16 |
|   2017-01-07 | b       | y       |    16 |
|   2017-01-08 | b       | y       |    16 |
|   2017-01-09 | b       | y       |    16 |
|   2017-01-10 | b       | y       |    16 |
+--------------+---------+---------+-------+

Basically a price history with all date gaps filled, for every combination of product and partner.

I'm having a hard time figuring out how to get this done, especially how to generate multiple rows for the same date where no price change has happened. Any ideas?

Community
  • 1
  • 1
devnull
  • 443
  • 6
  • 13
  • What have you tried so far? I don't understand from your question how you would end up with multiple rows per date. What determines that? – Elliott Brossard Feb 07 '17 at 19:15
  • @ElliottBrossard the premise that I need to repeat the most recent value for every combination of product and partner every day that their value hasn't changed. So if I have two products and two partners, on days with no value change there should be 4 rows with the latest value – devnull Feb 07 '17 at 19:17

1 Answers1

4

Try below

#standardSQL
WITH history AS (
  SELECT '2017-01-01' AS d, 'a' AS product, 'x' AS partner, 10 AS value UNION ALL
  SELECT '2017-01-01' AS d, 'b' AS product, 'x' AS partner, 15 AS value UNION ALL
  SELECT '2017-01-01' AS d, 'a' AS product, 'y' AS partner, 11 AS value UNION ALL
  SELECT '2017-01-01' AS d, 'b' AS product, 'y' AS partner, 16 AS value UNION ALL
  SELECT '2017-01-05' AS d, 'b' AS product, 'x' AS partner, 13 AS value UNION ALL
  SELECT '2017-01-07' AS d, 'a' AS product, 'y' AS partner, 15 AS value UNION ALL
  SELECT '2017-01-07' AS d, 'a' AS product, 'x' AS partner, 15 AS value 
),
daterange AS (
  SELECT date_in_range  
  FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-01-10')) AS date_in_range 
),
temp AS (
  SELECT d, product, partner, value, LEAD(d) OVER(PARTITION BY product, partner ORDER BY d) AS next_d
  FROM history
  ORDER BY product, partner, d
)
SELECT date_in_range, product, partner, value
FROM daterange
JOIN temp
ON daterange.date_in_range >= PARSE_DATE('%Y-%m-%d', temp.d) 
AND (daterange.date_in_range < PARSE_DATE('%Y-%m-%d', temp.next_d) OR temp.next_d IS NULL)
ORDER BY product, partner, date_in_range
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230