0

** EDIT **

SELECT DATEADD(MONTH, SEQ4(), to_date('2020-12-01')) AS "REPORTING MONTH" FROM TABLE (GENERATOR(ROWCOUNT => 12))

Root cause is the seq4() function is causing the issue when there are large number of records , is there any alternative to seq4() ( tried seq8() too but din't work either)

I am trying to generate a cartesian product of results from two tables in snowflake.

As you can see in second query M was generating results for all the months of 2020 and cross joined with Query A that returns around 1717 results. Final output has 18887 results.

Can you please let me know the right way to use cross join ?

Question

  • However the "REPORTING MONTH" field has three invalid values '2021-05-01' , '2022-01-01' , '2022-09-01' for each row in Query A, i.e

Expected
A1 , 2020-01-01
A1 , 2020-02-01
A1 , 2020-03-01
A1 , 2020-04-01
A1 , 2020-05-01
A1 , 2020-06-01
A1 , 2020-07-01
A1 , 2020-08-01
A1 , 2020-09-01
A1 , 2020-10-01
A1 , 2020-11-01
A1 , 2020-12-01

Error Actual
A1 , 2020-01-01
A1 , 2020-02-01
A1 , 2020-03-01
A1 , 2020-04-01
A1 , 2020-05-01
A1 , 2020-06-01
A1 , 2020-07-01
A1 , 2020-08-01
A1 , 2020-09-01
A1 , 2021-05-01 (invalid)
A1 , 2022-01-01 (invalid)
A1 , 2022-09-01 (invalid)


WITH A AS (
            (SELECT * FROM (SELECT TO_TIMESTAMP_NTZ(TO_VARCHAR(EVENT_TIME, 'YYYY-MM-01')) AS "EVENT REPORTING MONTH", *,
                RANK() OVER (PARTITION BY ID, SERVICE_TYPE ORDER BY EVENT_TIME) AS RANK FROM "MY_TABLE") AS E
                    WHERE RANK = 1 AND DATEDIFF(MONTH , DATE("EVENT REPORTING MONTH") ,CURRENT_DATE())  > 0) 
    ),
     M AS (
    SELECT DATEADD(MONTH, SEQ4(), to_date('2020-01-01')) AS "REPORTING MONTH" FROM TABLE (GENERATOR(ROWCOUNT => 12))
     ),
   F AS (
    SELECT * FROM A cross JOIN M
   )
   SELECT * FROM F order by "ID","REPORTING MONTH" DESC
Mozhi
  • 757
  • 1
  • 11
  • 28
  • Hi - please provide the CREATE TABLE sql for "MY_TABLE" and also the data it contains. Thanks – NickW Jan 18 '21 at 10:13
  • @NickW I found that it is due to SELECT DATEADD(MONTH, SEQ4(), to_date('2020-12-01')) , seq4() function , if i hardcode the value of months without sequence function it was working. Is there anything wrong with the seq function usage in the above query ? – Mozhi Jan 18 '21 at 10:15
  • When I run that SQL I just get 12 records (from 12/2020 to 11/2021) which is what I would expect. Are you getting something different? I'm running: SELECT DATEADD(MONTH, SEQ4(), to_date('2020-12-01')) AS "REPORTING MONTH" FROM TABLE (GENERATOR(ROWCOUNT => 12)); – NickW Jan 18 '21 at 10:21
  • @NickW when sequence is used in cross join with large number of results it produces gaps in sequence values and hence the issue – Mozhi Jan 18 '21 at 10:29
  • 1
    As the SQL containing the sequence works perfectly when run on it's own (unless you disagree), the only way for someone to investigate this further is for you to provide the structure and data (a subset will do as long as it still shows the behaviour on your system) of "MY_TABLE" – NickW Jan 18 '21 at 10:33
  • I agree the problem is somewhere else. The Seq operator is correct. Cross join also works. The code you provided can actually not yield dates after 2020. – Rick Jan 18 '21 at 10:50
  • @Zephro Similar issue posted here - seems like sequence functions are not helpful in case of large number of results. My source table has around 18887 results https://stackoverflow.com/questions/54348801/generate-series-equivalent-in-snowflake – Mozhi Jan 18 '21 at 10:52
  • But the seq operator is only used to generate 12 rows in your code. These rows are then joined with your x rows from A. The join is an inner join and in effect will only return dates from 2020-01 to 2020-12 – Rick Jan 18 '21 at 10:55
  • @Zephro , that's right but somehow if i replace seq function with hardcoded values , results were valid but using seq4()/seq8() results are wrong. – Mozhi Jan 18 '21 at 11:06

1 Answers1

1

The SEQ4 / SEQ8 can have gaps, as documented here: https://docs.snowflake.com/en/sql-reference/functions/seq1.html

Try to use row_number instead. Here is your example with row_number:

SELECT
DATEADD(
    MONTH,
    row_number() over ( order by 1 ),
    to_date('2020-12-01')
) AS "REPORTING MONTH"
FROM
    TABLE (GENERATOR(ROWCOUNT = > 12))
Rick
  • 2,080
  • 14
  • 27