3

So I've found similar resources that address how to do this in SQL, like this: Duplicating records to fill gap between dates

I understand that BigQuery may not be the best place to do this, so I'm trying to see if it's at all possible. When trying to run some of the methods in the link above above I'm hitting a wall as some of the functions aren't supported within BigQuery.

If a table exists with data structured like so:

    MODIFY_DATE             SKU         STORE   STOCK_ON_HAND
    08/01/2016 00:00:00     1120010     21      100
    08/05/2016 00:00:00     1120010     21      75
    08/07/2016 00:00:00     1120010     21      40

How can I build a query within Google BigQuery that yields an output like the one below? A value at a given date is repeated until the next change for the dates in between:

    MODIFY_DATE             SKU         STORE   STOCK_ON_HAND
    08/01/2016 00:00:00     1120010     21      100
    08/02/2016 00:00:00     1120010     21      100
    08/03/2016 00:00:00     1120010     21      100
    08/04/2016 00:00:00     1120010     21      100
    08/05/2016 00:00:00     1120010     21      75
    08/06/2016 00:00:00     1120010     21      75
    08/07/2016 00:00:00     1120010     21      40

I know I need to generate a table that has all the dates within a given range, but I'm having a hard time understanding if this can be done. Any ideas?

Community
  • 1
  • 1
jab
  • 33
  • 3

2 Answers2

3

How can I build a query within Google BigQuery that yields an output like the one below? A value at a given date is repeated until the next change for the dates in between

See example below

SELECT
  MODIFY_DATE, 
  MAX(SKU_TEMP) OVER(PARTITION BY grp) AS SKU,
  MAX(STORE_TEMP) OVER(PARTITION BY grp) AS STORE,
  MAX(STOCK_ON_HAND_TEMP) OVER(PARTITION BY grp) AS STOCK_ON_HAND,
FROM (
  SELECT
    DAY AS MODIFY_DATE, SKU AS SKU_TEMP, STORE AS STORE_TEMP, STOCK_ON_HAND AS STOCK_ON_HAND_TEMP,
    COUNT(SKU) OVER(ORDER BY DAY ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp,
  FROM (
    SELECT DATE(DATE_ADD(TIMESTAMP("2016-08-01"), pos - 1, "DAY")) AS DAY
    FROM (
         SELECT ROW_NUMBER() OVER() AS pos, *
         FROM (FLATTEN((
         SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP("2016-08-07"), TIMESTAMP("2016-08-01")), '.'),'') AS h
         FROM (SELECT NULL)),h
    )))
  ) AS DATES
  LEFT JOIN (
    SELECT DATE(MODIFY_DATE) AS MODIFY_DATE, SKU, STORE, STOCK_ON_HAND 
    FROM 
      (SELECT "2016-08-01" AS MODIFY_DATE, "1120010" AS SKU, 21 AS STORE, 75 AS STOCK_ON_HAND),
      (SELECT "2016-08-05" AS MODIFY_DATE, "1120010" AS SKU, 22 AS STORE, 100 AS STOCK_ON_HAND),
      (SELECT "2016-08-07" AS MODIFY_DATE, "1120011" AS SKU, 23 AS STORE, 40 AS STOCK_ON_HAND),
  ) AS TABLE_WITH_GAPS
  ON TABLE_WITH_GAPS.MODIFY_DATE = DATES.DAY
)
ORDER BY MODIFY_DATE
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Thanks Mikhail, this works as a sample but need to adjust this a bit for a dynamic range of dates and SKU's. Great work man, appreciate the insight on it. – jab Sep 21 '16 at 22:04
  • 1
    Hey @Mikhail, I have the exact same issue as the original poster, and your answer works fine when I only have a single SKU in my table, however when I have multiple SKUs and Stores, and need to fill the date for each of those, your query doesn't really work. Also, I'd like to see the solution in Standard SQL. Would you kindly update your answer? Thanks! – devnull Feb 07 '17 at 16:55
  • 1
    @GustavoDaniloMachado - my answer was "tailored" for the particular question and is not intended to fit all cases. If you post new question - i will be more than glad to answer yours :o) – Mikhail Berlyant Feb 07 '17 at 18:15
  • I see @MikhailBerlyant, here is the one I posted: http://stackoverflow.com/questions/42097764/duplicate-groups-of-records-to-fill-multiple-date-gaps-in-google-bigquery – devnull Feb 07 '17 at 19:26
0

I need to generate a table that has all the dates within a given range, but I'm having a hard time understanding if this can be done. Any ideas?

SELECT DATE(DATE_ADD(TIMESTAMP("2016-08-01"), pos - 1, "DAY")) AS DAY
FROM (
     SELECT ROW_NUMBER() OVER() AS pos, *
     FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP("2016-08-07"), TIMESTAMP("2016-08-01")), '.'),'') AS h
     FROM (SELECT NULL)),h
)))
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230