17

I am trying to get daily sum of sales from a google big-query table. I used following code for that.

select Day(InvoiceDate) date, Sum(InvoiceAmount) sales from test_gmail_com.sales 
where year(InvoiceDate) = Year(current_date()) and
Month(InvoiceDate) = Month(current_date())
group by date order by date

From the above query it gives only the sum of sales daily which were in the table. There is a chance that some days do not have any sales. For those kind of situations, I need to get the date and sum should be 0. As an example, in every month should 30 0r 31 rows with sum of sales. Examples show below. 4th day of the month does not have a sales. So its sum should be 0.

date | sales
-----+------
1    |   259
-----+------
2    |   359
-----+------
3    |   45
-----+------
4    |    0
-----+------
5    |  156

Is it possible to do in Big-query? Basically date column should be a series from 1 - 28/29/30 or 31st depending on the month of the year

Manura Omal
  • 1,005
  • 3
  • 10
  • 26

5 Answers5

55

Generting a list of dates and then joining whatever table you need on top seems the easiest. I used the generate_date_array + unnest and it looks quite clean.

To generate a list of days (one day per row):

  SELECT
  *
  FROM 
    UNNEST(GENERATE_DATE_ARRAY('2018-10-01', '2020-09-30', INTERVAL 1 DAY)) AS example
Cohen
  • 640
  • 1
  • 6
  • 6
12

You can use below to generate on fly all dates in given range (in below example it is all dates from 2015-06-01 till CURRENT_DATE() - by changing those you can control which dates range to generate)

SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS calendar_day
FROM (
     SELECT ROW_NUMBER() OVER() AS pos, *
     FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
     FROM (SELECT NULL)),h
)))

so, now - you can use it with LEFT JOIN with your table to have all dates accounted. See potential example below

SELECT
  calendar_day,
  IFNULL(sales, 0) AS sales
FROM (
  SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS calendar_day
  FROM (
       SELECT ROW_NUMBER() OVER() AS pos, *
       FROM (FLATTEN((
       SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
       FROM (SELECT NULL)),h
  )))
) AS all_dates
LEFT JOIN (
  SELECT DAY(InvoiceDate) DATE, SUM(InvoiceAmount) sales 
  FROM test_gmail_com.sales 
  WHERE YEAR(InvoiceDate) = YEAR(CURRENT_DATE()) AND
  MONTH(InvoiceDate) = MONTH(CURRENT_DATE())
  GROUP BY DATE 
)
ON DATE = calendar_day  

I wanna need to get previous months sales

Below gives all days of previous month

SELECT DATE(DATE_ADD(DATE_ADD(DATE_ADD(CURRENT_DATE(), -1, "MONTH"), 1 - DAY(CURRENT_DATE()), "DAY"), pos - 1, "DAY")) AS calendar_day
FROM (
     SELECT ROW_NUMBER() OVER() AS pos, *
     FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + DATEDIFF(DATE_ADD(CURRENT_DATE(), - DAY(CURRENT_DATE()), "DAY"), DATE_ADD(DATE_ADD(CURRENT_DATE(), -1, "MONTH"), 1 - DAY(CURRENT_DATE()), "DAY")), '.'),'') AS h
     FROM (SELECT NULL)),h
)))
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • 2
    The above solution works with bigquery legacy. As per https://stackoverflow.com/questions/38306016/populating-a-table-with-all-dates-in-a-given-range-in-google-bigquery , for the latest version, the below query works SELECT day FROM UNNEST( GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 DAY) ) AS day – mhn Jun 05 '19 at 22:01
4

Using the Standard SQL dialect and the generate_array function to simplify the code:

WITH serialnum AS (
  SELECT
    sn
  FROM
    UNNEST(GENERATE_ARRAY(0, 
                          DATE_DIFF(DATE_ADD(DATE_TRUNC(CURRENT_DATE()
                                                      , MONTH)
                                          , INTERVAL 1 MONTH)
                                  , DATE_TRUNC(CURRENT_DATE(), MONTH)
                                  , DAY) - 1)
                          ) AS sn
), date_seq AS (
SELECT
    DATE_ADD(DATE_TRUNC(CURRENT_DATE(), MONTH),
            INTERVAL(sn) DAY) AS this_day
FROM
  serialnum
)
SELECT
    Day(InvoiceDate) date
    , Sum(IFNULL(InvoiceAmount, 0)) sales
FROM
    date_seq
    LEFT JOIN
    test_gmail_com.sales
ON
    date_seq.this_day = DAY(test_gmail_com.sales.InvoiceDate)
WHERE
    year(InvoiceDate) = Year(current_date())
    and
    Month(InvoiceDate) = Month(current_date())
GROUP BY
    date
ORDER BY
    date
;

UPDATE

Or, simpler still using the generate_date_array function:

WITH date_seq AS (
SELECT
  GENERATE_DATE_ARRAY(DATE_TRUNC(CURRENT_DATE(), MONTH), 
                      DATE_ADD(DATE_ADD(DATE_TRUNC(CURRENT_DATE(), MONTH)
                                        , INTERVAL 1 MONTH)
                               , INTERVAL -1 DAY)
                      , INTERVAL 1 DAY)
    AS this_day
)
SELECT
    Day(InvoiceDate) date
    , Sum(IFNULL(InvoiceAmount, 0)) sales
FROM
    date_seq
    LEFT JOIN
    test_gmail_com.sales
ON
    date_seq.this_day = DAY(test_gmail_com.sales.InvoiceDate)
WHERE
    year(InvoiceDate) = Year(current_date())
    and
    Month(InvoiceDate) = Month(current_date())
GROUP BY
    date
ORDER BY
    date
;
blueogive
  • 518
  • 6
  • 11
  • generate_date_array() is really nice! The only small drawback is that it does not appear in BigQuery's documentation. But it works really fine – Sourygna May 17 '18 at 10:09
  • it appears now in BigQuery documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#generate_date_array – shankshera Oct 26 '19 at 00:14
1

For these purposes it is practical to have a 'calendar' table, a table that just lists all the days within a certain range. For your specific question, it would suffice to have a table with the numbers 1 to 31. A quick way to get this table is to make a spreadsheet with these numbers, save it as a csv file and import this file into BigQuery as a table.

You then left outer join your result set onto this table, with ifnull(sales,0) as sales.

If you want the number of days per month (28--31) to be right, you basically have two options. Either you create a proper calendar table that covers several years and that you join on using year, month and day. Or you use the simple table with numbers 1--31 and remove numbers based on the month and the year.

oulenz
  • 1,199
  • 1
  • 15
  • 24
  • Isn't there a way like in here http://stackoverflow.com/questions/11391085/getting-date-list-in-a-range-in-postgresql. – Manura Omal Aug 01 '16 at 09:23
  • There is nothing equivalent to `generate_series`, so you can't easily generate a list on the fly. Once you have a list of numbers in a table, you _can_ use the function `date_add` to generate a list of dates by offsetting today's date by x days. (`date_add` is defined here: https://cloud.google.com/bigquery/query-reference#date_add) – oulenz Aug 01 '16 at 09:36
  • Dates are not equal for each months and it depends on the year as well(number of days in feb). So how to overcome that? – Manura Omal Aug 01 '16 at 09:50
0

For Standard SQL

WITH

splitted AS (
  SELECT
    *
  FROM
    UNNEST( SPLIT(RPAD('',
          1 + DATE_DIFF(CURRENT_DATE(), DATE("2015-06-01"), DAY),
          '.'),''))),
  with_row_numbers AS (
  SELECT
    ROW_NUMBER() OVER() AS pos,
    *
  FROM
    splitted),
  calendar_day AS (
  SELECT
    DATE_ADD(DATE("2015-06-01"), INTERVAL (pos - 1) DAY) AS day
  FROM
    with_row_numbers)
SELECT
  *
FROM
  calendar_day
ORDER BY
  day DESC
martin.code
  • 1,291
  • 9
  • 12