1

I have a table with hiredate (Date) ,First Name (String) and Sur Name (string) like this:

hireDate    First Name      Surname
13-oct-14   Cintia Roxana   Padilla Julca
28-oct-14   Conor           McAteer
28-oct-14   Paolo           Mesia Macher
28-oct-14   William Anthony Whelan
15-nov-14   Peter Michael   Coates
13-feb-15   Natalie         Conche
15-mar-15   Beatriz         Vargas Huanca
01-may-15   Walter          Calle Chenccnes
04-may-15   Sarah Louise    Price

And I made a view of a frequency of hire_dates(DATE) and the cumulative frequency in the other column like this:

Row hireDate    Count       Cumulative
1   13/10/2014  1           1
2   28/10/2014  3           4
3   15/11/2014  1           5
4   13/02/2015  1           6
5   15/03/2015  1           7
6   09/04/2015  1           8
7   15/04/2015  1           9
8   01/05/2015  1           10

And the query goes like this:

WITH
Data AS (
 SELECT
 hireDate,
 COUNT(1) AS Count
 FROM
 `human-resources-221122.human_resources.employees_view`
 WHERE
 status <> "cancelled"
 GROUP BY
 1 )

SELECT
hireDate,
Count,
SUM(Count) OVER (ORDER BY hireDate ASC ROWS BETWEEN UNBOUNDED PRECEDING 
AND CURRENT ROW) AS Cumulative
FROM
Data
ORDER BY
hireDate ASC

But I need to see the numbers by month and year with zeros in those places that there are no count, something like this:

Hire_Month  Hire_Year   Count   Cumulative
October     2014        4       4
November    2014        1       5
December    2014        0       5
January     2015        0       5
February    2015        1       6
March       2015        1       7
April       2015        2       9
May         2015        1       10

Thanks in advance.

Renato Galvez
  • 83
  • 1
  • 6

1 Answers1

0

Note the use of GENERATE_DATE_ARRAY and RIGHT JOIN to get the desired results:

WITH data AS (
  SELECT * 
  FROM UNNEST ([
    STRUCT(DATE("2014-12-03") AS d, 4 AS a)
    , STRUCT("2015-01-05", 7)
    , STRUCT("2015-03-05", 1)
  ])
), all_months AS (
   SELECT month
   FROM UNNEST(GENERATE_DATE_ARRAY(
     (SELECT DATE_TRUNC(MIN(d), MONTH) FROM data)
     , (SELECT MAX(d) FROM data)
     , INTERVAL 1 MONTH)
   ) AS month
)


SELECT month, IFNULL(SUM(a),0) a, SUM(SUM(a)) OVER(ORDER BY month) a_cum
FROM data 
RIGHT JOIN all_months
ON DATE_TRUNC(d, MONTH)=month
GROUP BY month
ORDER BY month

enter image description here

Now, if we are just counting, you can use the fact that a LEFT/RIGHT JOIN will have nulls for empty months. This is how the query could be adapted to an arbitrary table (Wikipedia here):

WITH data AS (
  SELECT *, DATE(datehour) d
  FROM `fh-bigquery.wikipedia_v3.pageviews_2018` 
  WHERE wiki='pt'
    AND (datehour BETWEEN '2018-09-30' AND '2018-09-30'
      OR datehour BETWEEN '2018-12-01' AND '2018-12-02'
    )
    AND title LIKE 'Calif%'
), all_months AS (
   SELECT month
   FROM UNNEST(GENERATE_DATE_ARRAY(
     (SELECT DATE_TRUNC(MIN(d), MONTH) FROM data)
     , (SELECT MAX(d) FROM data)
     , INTERVAL 1 MONTH)
   ) AS month
)


SELECT month, COUNT(d) c, SUM(COUNT(d)) OVER(ORDER BY month) a_cum
FROM data 
RIGHT JOIN all_months
ON DATE_TRUNC(d, MONTH)=month
GROUP BY month
ORDER BY month

enter image description here

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • Interesting, but i have a 1000 rows of hire dates from the table, i don't know if i can add this table into the query – – Renato Galvez Feb 18 '19 at 13:49
  • add a table? all you need is `GENERATE_DATE_ARRAY` and `RIGHT JOIN` – Felipe Hoffa Feb 18 '19 at 16:40
  • in your example you show 3 examples hire dates that you struct one bye one with STRUCT(DATE("2014-12-03") AS d, 4 AS a) as your first input date, but in my table i don't have 3, i have 1000 hire dates, my question is: do i have to struct one by one as your example???, or there is a way to struct from a column's name?? – Renato Galvez Feb 18 '19 at 23:54
  • I did that in my query because I don't have your table. Have I had your table, I would have just used that. – Felipe Hoffa Feb 19 '19 at 07:42
  • i update the question with the table in the beginning, please check. – Renato Galvez Feb 19 '19 at 16:38
  • I don't understand why you can't use "hiredate" instead of my sample table. All you need to do is use your table instead of mine – Felipe Hoffa Feb 20 '19 at 01:32
  • you are inputting manually the count of every hire date in the Struct[], and use the first hire date as your start count, how can i struct my table??, i don't understand the first six lines of your query – Renato Galvez Feb 20 '19 at 23:59
  • Forget about my table. I'm doing things manually so I can create fake data, because otherwise the query won't run. You don't need to create fake data. Just replace `FROM data` with `FROM wherever_your_dataset_table` is. Instead of `d`, ask for the column `hireDate`. – Felipe Hoffa Feb 21 '19 at 00:15
  • Have you tried running the query in BigQuery? All you need to do is replace `FROM data` once you do... – Felipe Hoffa Feb 21 '19 at 00:18
  • yes i have, i replace your data with my data and it don't recognize the letter "a"in the last 5 lines of your query, i put: WITH data AS ( SELECT STRUCT(hireDate AS d, 4 AS a) FROM `Table`) – Renato Galvez Feb 21 '19 at 00:20
  • and also i tried this: WITH data AS ( SELECT hireDate FROM `table`), with another error in the 5 same last lines of your query – Renato Galvez Feb 21 '19 at 00:29
  • Oopps on `a`. Replace `a` with 1. I'll update the post with a query showing how to use the same query with an arbitrary table. – Felipe Hoffa Feb 21 '19 at 00:37
  • i put: WITH data AS ( SELECT hiredate FROM `TABLE` ), all_months AS ( SELECT month FROM UNNEST(GENERATE_DATE_ARRAY( (SELECT DATE_TRUNC(MIN(hiredate), MONTH) FROM data) , (SELECT MAX(hiredate) FROM data) , INTERVAL 1 MONTH) ) AS month ) SELECT month, IFNULL(SUM(1),0) 1, SUM(SUM(1)) OVER(ORDER BY month) a_cum FROM data RIGHT JOIN all_months ON DATE_TRUNC(1, MONTH)=month GROUP BY month ORDER BY month , gives me a Syntax error: Unexpected integer literal "1" at [14:32] – Renato Galvez Feb 22 '19 at 16:28
  • see my second query, write `IFNULL(SUM(1),0) a, SUM(SUM(1))` – Felipe Hoffa Feb 22 '19 at 16:29
  • ON DATE_TRUNC(1, MONTH)=month ----give me this error: No matching signature for function DATE_TRUNC for argument types: INT64, DATE_TIME_PART. Supported signature: DATE_TRUNC(DATE, DATE_TIME_PART) at [30:3] – Renato Galvez Feb 24 '19 at 04:56
  • it works now, but i have 1 instead zeros in those "null" months – Renato Galvez Feb 25 '19 at 20:54
  • Right! I fixed that, see second query – Felipe Hoffa Feb 25 '19 at 21:19
  • weird, I thought I had edited it yesterday - check now – Felipe Hoffa Feb 26 '19 at 20:18