2

I'm working to create a database view that associates an online order with a users preceding website visits. This is for an eCommerce website so one user may visit and order several times.

I've already joined the visits table and orders table on user_id and associated the nearest less-than session time with an order time. Now, I'm hoping to say every visit up until order #1 is "1", and then after visit thereafter up until order #2 is "2". Also, if there is no order_id for that specific user I'd like to return "0". See screenshots linked below for reference.

I've already tried to use dense_rank, but it is only ranking the rows in which an order_id is present. I want to carry forward these ranks.

SELECT v.id AS visit_id,
    v.user_id,
    v.started_at AS visit_date,
    dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
    dense_rank() OVER (PARTITION BY v.user_id ORDER BY o.id) AS order_number,
    o.id AS order_id,
    o.created_at AS order_date
   FROM visits v
     FULL JOIN orders o ON v.user_id = o.user_id AND v.started_at < o.created_at AND o.created_at < (( SELECT min(visits.started_at) AS min
           FROM visits
          WHERE visits.user_id = v.user_id AND visits.started_at > v.started_at)) AND (v.started_at + '24:00:00'::interval) > o.created_at
  GROUP BY v.id, v.user_id, v.started_at, o.id, o.created_at
  ORDER BY v.started_at;

Current results Expected Results

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • First, it appears that you have a visit for every order, so `left join` is more appropriate than `full join`. Second, what are the numbers for the visits after the last order? – Gordon Linoff Apr 12 '19 at 22:12
  • There are several orders that are not associated with a website visit, so we did not want to exclude these (think manual inputs and such). Any visits that happened after the last order should appear as "null" until the next order happens. – Brandon Thomson Apr 12 '19 at 22:18

2 Answers2

0

Use lag to check if the previous row is non-null so it can be flagged for a new group start. Once the flag is set, you can use a running sum to define groups.

SELECT T.*,
       1+SUM(FLAG) OVER(PARTITION BY user_id ORDER BY visit_date) AS order_number
FROM (
SELECT v.id AS visit_id,
    v.user_id,
    v.started_at AS visit_date,
    dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
    o.id AS order_id,
    o.created_at AS order_date,
    --conditioncheck with lag
    case when lag(o.id) over(partition by v.user_id order by v.started_at) is not null then 1 else 0 end as flag
   FROM visits v
     FULL JOIN orders o ON v.user_id = o.user_id AND v.started_at < o.created_at AND o.created_at < (( SELECT min(visits.started_at) AS min
           FROM visits
          WHERE visits.user_id = v.user_id AND visits.started_at > v.started_at)) AND (v.started_at + '24:00:00'::interval) > o.created_at
    ) T
Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
  • Thanks Vamsi. This almost solves my issue. One thing I did change in your code was partition by v.user_id in your condition check. Unfortunately, this continues to add 1 to the order number even if there isn't another order that has happened yet. For example, if I visit the website after my latest order, but I haven't made another order, I would ideally like this to be listed as "null" until they place their next order. I'm assuming I will need another condition but I'm not sure how to go about it. – Brandon Thomson Apr 12 '19 at 21:40
0

The GROUP BY seems unnecessary, but I'll leave it it. You basically need a cumulative sum.

I would assign all visits before a particular order the order number:

SELECT v.id AS visit_id, v.user_id,
       v.started_at AS visit_date,
       dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
       dense_rank() OVER (PARTITION BY v.user_id ORDER BY o.id) AS order_number,
       o.id AS order_id,
       o.created_at AS order_date,
       count(o.id) over (partition by v.user_id order by v.started_at) as order_number
FROM visits v FULL JOIN
     orders o
     ON v.user_id = o.user_id AND
        v.started_at < o.created_at AND
        o.created_at < (SELECT min(visits.started_at)
                        FROM visits v2 
                        WHERE v2.user_id = v.user_id AND 
                              v2.started_at > v.started_at) AND
        (v.started_at + '24:00:00'::interval) > o.created_at
GROUP BY v.id, v.user_id, v.started_at, o.id, o.created_at
ORDER BY v.started_at;

I think this is the logic that you want:

SELECT v.id AS visit_id, v.user_id,
       v.started_at AS visit_date,
       dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
       dense_rank() OVER (PARTITION BY v.user_id ORDER BY o.id) AS order_number,
       o.id AS order_id,
       o.created_at AS order_date,
       MIN(o.order_number) OVER (PARTITION BY v.user_id ORDER BY v.started_at DESC) as order_number
FROM visits v FULL JOIN
     (SELECT o.*,
             ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.id) as order_number
      FROM orders o
     ) o
     ON v.user_id = o.user_id AND
        v.started_at < o.created_at AND
        o.created_at < (SELECT min(visits.started_at)
                        FROM visits v2 
                        WHERE v2.user_id = v.user_id AND 
                              v2.started_at > v.started_at) AND
        (v.started_at + '24:00:00'::interval) > o.created_at
GROUP BY v.id, v.user_id, v.started_at, o.id, o.created_at
ORDER BY v.started_at;

It might produces NULLs where you want 0s, however.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for this response, Gordon. I have the group by clause in place because there are several aggregates that I omitted in this example code (to keep it concise). After testing your second bit of code, the only issue I'm having is that the row for which there is an order number doesn't take the right value. For example, all visits leading to the first order show "0" as expected, but then the row with order details also shows "0". In addition, all visits after the most recent order show an order number when one hasn't been placed yet. – Brandon Thomson Apr 12 '19 at 22:20