Calculate the first and second most maximum value at every row and the average of both snowflake SQL

Question

I have a table with the following schema:

uid	visit name	visit date	sales quantity
xyz	visit 1	2020-01-01	29
xyz	visit 2	2020-01-03	250
xyz	visit 3	2020-01-04	20
xyz	visit 4	2020-01-27	21
abc	visit 1	2020-02-01	29
abc	visit 2	2020-03-03	34
abc	visit 3	2020-04-04	35
abc	visit 4	2020-04-27	41

base table sales

Each unique id has a few unique visits that repeat for every unique id, at every visit I have to calculate what the two most highest sales quantity is per user- across their prior visits(ascending order) up until the current visit named in the row for each unique id and excluding the current row.

output would be- the same table plus these columns

max sale	2nd max sale	avg of both max sales

output table

I have used window functions for the maximum value, but I am struggling to get the second highest value of sales for every user for every row. Is this doable using sql? If so what would the script look like?

Please: 1) post the data and desired output as a text or insert statements (to make it reusable). 2) if you've already tried some code, then post it and describe what was wrong with it. — astentx, Nov 15 '21 at 23:03
you can use `dense_rank()` and order by total sales quantity desc then as a derived table filter rank_col < 3 to get that top 2 of the same id or something — RoMEoMusTDiE, Nov 15 '21 at 23:13
dense rank would not work if I had to give a rank at every row per prior rows right?? — mmdhna, Nov 15 '21 at 23:32

Felipe Hoffa · Answer 1 · 2021-11-17T00:55:50.587

Update: I re-wrote my answer, because the previous one ignored certain requirements.

To keep track of the 2 previous top values, you can write a UDTF in JS to hold that ranking:

create or replace function udtf_top2_before(points float)
returns table (output_col array)
language javascript
as $$
{
    processRow: function f(row, rowWriter, context){
        rowWriter.writeRow({OUTPUT_COL: this.prevmax.slice().reverse()});
        this.prevmax.push(row.POINTS);
        // silly js sort https://stackoverflow.com/a/21595293/132438
        this.prevmax = this.prevmax.sort(function (a, b) {return a - b;}).slice(-2);
    }        
    , initialize: function(argumentInfo, context) {
        this.prevmax = [];
    }
}
$$;

Then that tabular UDF can will give you the numbers as expected:

with data as (
    select v:author::string author, v:score::int score, v:subreddit, v:created_utc::timestamp ts
    from reddit_comments_sample
    where v:subreddit = 'wallstreetbets'
)


select author, score, ts
    , output_col[0] prev_max
    , output_col[1] prev_max2
    , (prev_max+ifnull(prev_max2,prev_max))/2 avg
from (
    select author, score, ts, output_col 
    from data, table(udtf_top2_before(score::float) over(partition by author order by ts))
    order by author, ts
    limit 100
)

UDTF based on my previous post:

https://towardsdatascience.com/sql-puzzle-optimization-the-udtf-approach-for-a-decay-function-4b4b3cdc8596

Previously:

You can use row_number() over() to select the top 2, and then pivot with an array_agg():

with data as (
    select v:author author, v:score::int score, v:subreddit, v:created_utc::timestamp ts
    from reddit_comments_sample
    where v:subreddit = 'wallstreetbets'
)

select author, arr[0] max_score, arr[1] max_score_2, (max_score+ifnull(max_score_2,max_score))/2 avg
from (
    select author
        , array_agg(score) within group (order by score::int desc) arr
    from (
        select author, score, ts
        from data
        qualify row_number() over(partition by author order by score desc) <= 2
    )
    group by 1
)
order by 4 desc

Hi Felipe, Thank you for your response. does this involve timestamp? Basically, for every author at a current row I will have to set the rank based off of rows that account for the author in a previous timestamp. — mmdhna, Nov 16 '21 at 18:13

Calculate the first and second most maximum value at every row and the average of both snowflake SQL

1 Answers1