1

I've read several articles giving examples of what can MATCH_RECOGNIZE do. One of them is creating a step funnel. Let's say we want to track certain events and see after each of the event how many users left. For instance, arriving at the home page, then going to the search page, then putting something in the cart, and finally paying. Those are events, and we have records for each one of them. Now I would like to create funnel like this:

  1. Arriving at home page - 1000 users
  2. Going to search page - 980 users
  3. Adding something to shopping cart - 90 users
  4. Paying - 10 users

That's step funnel example, where we have fewer and fewer users after each step. Now, back to match_recognize, we can use this function to tell us how many users matched this pattern (event1 + event2 + event3 + event4 +), but the problem I'm trying to solve, how can we use this function so we know how many users did not pass to next event/phase, not just the ones that matched whole pattern sequence?

datahack
  • 477
  • 1
  • 11
  • 32
  • Wouldn't that just be a NOT IN type of query where you're seeing users that don't match the output of the MATCH_RECOGNIZE function? – Mike Walton Feb 10 '22 at 14:00
  • Hey Mike, thanks for pitching in. Match_recognize gives you only users that passed all 4 conditions(events), you do not know how many did not pass event2 and then how many did not pass event3..basically after each step how many fell off. – datahack Feb 10 '22 at 14:36
  • Oh, I see. I didn't realize you wanted them at each step. Could you run the function with each combination of steps and then cross match those to get what you were looking for? In other words run the function with `event1` and `event1 + event2` and so on? – Mike Walton Feb 10 '22 at 14:45
  • Yes, that came to my mind but what if there are 10 events, it is not very efficient to write 10 statements, so I'm looking for a better way :) – datahack Feb 10 '22 at 14:55
  • Makes sense. Sorry I couldn't be more helpful. – Mike Walton Feb 10 '22 at 15:16

1 Answers1

0

So using bits, and date limits per step you can build you own funnel logic, and decide what makes the cut or not:

WITH data(id, action_date, details) AS (
    SELECT * FROM VALUES
        (1,  '2022-03-02', 'home_page'),
        (1,  '2022-03-03', 'search'),
        (1,  '2022-03-04', 'add_cart'),
        (1,  '2022-03-05', 'pay'),

        (2,  '2022-03-02', 'home_page'),
        (2,  '2022-03-03', 'search'),
        (2,  '2022-03-04', 'add_cart'),

        (3,  '2022-03-02', 'home_page'),
        (3,  '2022-03-03', 'search'),

        (4,  '2022-03-02', 'home_page'),

        (5,  '2022-03-03', 'home_page'), -- missed the search step
        (5,  '2022-03-04', 'add_cart'),
    
        (6,  '2022-03-01', 'home_page'), -- gap between s1 & s2 too long
        (6,  '2022-03-05', 'search') 
), prep_a AS (
    SELECT id
        ,action_date
        ,details
        ,lag(details)over (partition by id order by action_date) as prior_detail
        ,datediff('seconds',lag(action_date,1,action_date)over(partition by id order by action_date),action_date)/86400 as prior_action_days_gap
        ,iff(details = 'home_page', action_date, null) as chain_start_date
        ,case
            when details = 'home_page' then 1
            when details = 'search' and prior_detail = 'home_page' and prior_action_days_gap <= 2 then 2
            when details = 'add_cart' and prior_detail = 'search' and prior_action_days_gap <= 1 then 4
            when details = 'pay' and prior_detail = 'add_cart' and prior_action_days_gap <= 1 then 8
            else 0
         end chain_bits
    FROM data
    --ORDER BY 1,2;
)
SELECT 
    chain_date
    ,count_if(funnel_home_page) as c_home_pages_step
    ,count_if(funnel_search) as c_search_step
    ,c_home_pages_step - c_search_step as c_search_step_drop
    ,count_if(funnel_cart) as c_cart_step
    ,c_search_step - c_cart_step as c_cart_step_drop
    ,count_if(funnel_pay) as c_pay_step
    ,c_cart_step_drop - c_pay_step as c_pay_step_drop
FROM (
    SELECT *
        ,sum(chain_bits)over(partition by id, chain_date order by action_date) as chain_state
        ,chain_bits=1 as funnel_home_page
        ,chain_bits=2 and chain_state = 3 as funnel_search
        ,chain_bits=4 and chain_state = 7 as funnel_cart
        ,chain_bits=8 and chain_state = 15 as funnel_pay
    FROM (
        SELECT *
            ,nvl(chain_start_date, lag(chain_start_date) ignore nulls over (partition by id order by action_date)) as chain_date
        FROM prep_a
    )
)
GROUP BY 1

gives:

CHAIN_DATE C_HOME_PAGES_STEP C_SEARCH_STEP C_SEARCH_STEP_DROP C_CART_STEP C_CART_STEP_DROP C_PAY_STEP C_PAY_STEP_DROP
2022-03-02 4 3 1 2 1 1 0
2022-03-03 1 0 1 0 0 0 0
2022-03-01 1 0 1 0 0 0 0
Simeon Pilgrim
  • 22,906
  • 3
  • 32
  • 45