1

In my rails app I have a typical ecommerce schema inside a Postgres 9.6 database. Here's a simplified version of it:

users table
  :name

products table
  :name

shopping_carts table
 :user_id

line_items table
  :price
  :qty
  :product_id
  :shopping_cart_id

I've got a working query to return the number of distinct products bought by each user:

SELECT COUNT(distinct p.*), u.name FROM products p
INNER JOIN line_items l ON p.id = l.product_id
INNER JOIN shopping_carts sc ON l.shopping_cart_id = sc.id
INNER JOIN users u ON sc.user_id = u.id
GROUP BY u.name

But I also want a count of products for each user that only that particular user has purchased. A possible method for this in Ruby (once everything was set up with ActiveRecord) might look something like:

def unique_prod(user)
  user.products.select { |p| p.users.length == 1 }.count
end

But how to do it in SQL? I think I need to do it using two counts - one for the number of different user_ids in a given product's shopping_carts (let's call this count user_count), and then a count of products for which user_count = 1. I'm having trouble incorporating the multiple COUNT and GROUP BY statements in working fashion. Any suggestions?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
TDB
  • 367
  • 1
  • 3
  • 15

1 Answers1

1

To do it all in one query:

SELECT scl.user_id, u.name, ct_dist_prod, ct_dist_prod_exclusive
FROM  (
   SELECT sc.user_id
        , count(DISTINCT l.product_id) AS ct_dist_prod
        , count(DISTINCT l.product_id)
                FILTER (WHERE NOT EXISTS (
                           SELECT 1
                           FROM   shopping_carts sc1
                           JOIN   line_items     l1 ON l1.shopping_cart_id = sc1.id
                           WHERE  l1.product_id = l.product_id 
                           AND    sc1.user_id  <> sc.user_id)) AS ct_dist_prod_exclusive
   FROM   shopping_carts sc
   JOIN   line_items     l ON l.shopping_cart_id = sc.id
   GROUP  BY 1
   ) scl
JOIN   users u ON u.id = scl.user_id;

I added the user_id to the result, because I cannot assume that name is defined unique (which would make your original query slightly incorrect).

The aggregate FILTER clause requires Postgres 9.4 or later:

How?

Assuming referential integrity enforced by a FK constraint, you do not need to join to the table products at all for this query.

Neither, at first, to the users table. The basic query boils down to:

SELECT sc.user_id, count(DISTINCT l.product_id)
FROM   shopping_carts sc
JOIN   line_items     l ON l.shopping_cart_id = sc.id
GROUP  BY 1;

Add the 2nd count to this cheaper query, where all rows with products are excluded for which another row with the same product and a different user exists (i.e. bought by a different user, too).

Then join to users to add the name. Cheaper.


Computing only the exclusive count is simpler. Example:

SELECT sc.user_id, count(DISTINCT l.product_id) AS ct_dist_prod_exclusive
FROM   shopping_carts  sc
JOIN   line_items      l ON l.shopping_cart_id = sc.id
LEFT   JOIN (
             shopping_carts sc1
       JOIN  line_items     l1 ON l1.shopping_cart_id = sc1.id
       ) ON l1.product_id = l.product_id 
    AND sc1.user_id <> sc.user_id
WHERE  l1.product_id IS NULL
GROUP  BY 1;

Note the essential parentheses.

Related:

Or (in response to your comment):

SELECT user_id, count(*) AS ct_dist_prod_exclusive
FROM  (
   SELECT max(user_id) AS user_id, l1.product_id
   FROM   line_items l1
   INNER  JOIN shopping_carts sc1 ON l.shopping_cart_id = sc1.id
   GROUP  BY l1.product_id
   HAVING COUNT(DISTINCT sc1.user_id) = 1  -- DISTINCT!
   ) p1
GROUP  BY user_id;

HAVING COUNT(DISTINCT sc1.user_id) = 1 because

products purchased by only one user

allows the product to be bought by the same user multiple times.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks. In your answer it looks like you rely on a user_id foreign key in the line_items table, but I don't have that in my schema. – TDB Apr 02 '17 at 13:41
  • @TDB: Right, I adapted the queries accordingly. – Erwin Brandstetter Apr 03 '17 at 02:49
  • 1
    I created this query (based on your original answer) which gives me a different result. Thoughts? (FYI my sql skills are pretty junior) SELECT sc.user_id, count(DISTINCT l.product_id) FROM shopping_carts sc INNER JOIN line_items l on l.shopping_cart_id = sc.id INNER JOIN ( SELECT l1.product_id FROM line_items l1 INNER JOIN shopping_carts sc1 ON l1.shopping_cart_id = sc1.id GROUP BY l1.product_id HAVING COUNT(sc1.user_id) = 1 ) one_user ON one_user.product_id = l.product_id GROUP BY sc.user_id – TDB Apr 03 '17 at 03:20
  • @TDB: Good approach. But it would have to be `HAVING COUNT(DISTINCT sc1.user_id)`, and you can simplify some more. I added another query. – Erwin Brandstetter Apr 03 '17 at 03:22