3

I have the following two tables:

CREATE TABLE products 
(
    id INT,
    created_at DATE,
    sold_at DATE
);

CREATE TABLE product_prices 
(
    id INT,
    product_id INT,
    price numeric,
    created_at DATE
);

The data model logic works as follows:

  1. When a new product is put for sale, a record is inserted into products with the current date as created_at.
  2. At the same time, a record is inserted into product_prices with the same date in created_at, an FK reference to the product and a price.
  3. If a product is sold, the sold_at is set on the products record.
  4. If the product changes price throughout its sales period a new record is added to product_prices with the created_at date the price was changed. This means, that if you wanna know what price a product has at a given date, then you need to check what the price was at that date by looking at product_prices.

Now imagine that I have seed data something like this:

SELECT * FROM products;
id created_at sold_at
1 2022-01-25T00:00:00.000Z 2022-02-18T00:00:00.000Z
2 2022-01-26T00:00:00.000Z
3 2022-01-28T00:00:00.000Z 2022-01-30T00:00:00.000Z
4 2022-02-01T00:00:00.000Z 2022-02-01T00:00:00.000Z
5 2022-02-01T00:00:00.000Z 2022-02-15T00:00:00.000Z
6 2022-02-10T00:00:00.000Z 2022-02-13T00:00:00.000Z
7 2022-02-14T00:00:00.000Z
8 2022-02-19T00:00:00.000Z
9 2022-02-20T00:00:00.000Z 2022-02-22T00:00:00.000Z
10 2022-02-22T00:00:00.000Z

and

SELECT * FROM product_prices;
id product_id price created_at
1 1 100.0 2022-01-25T00:00:00.000Z
2 1 95.0 2022-02-02T00:00:00.000Z
3 1 85.0 2022-02-17T00:00:00.000Z
4 2 89.0 2022-01-26T00:00:00.000Z
5 2 85.0 2022-01-30T00:00:00.000Z
6 3 91.0 2022-01-28T00:00:00.000Z
7 4 50.0 2022-02-01T00:00:00.000Z
8 5 100.0 2022-02-01T00:00:00.000Z
9 5 99.0 2022-02-03T00:00:00.000Z
10 6 79.0 2022-02-10T00:00:00.000Z
11 6 75.0 2022-02-11T00:00:00.000Z
12 6 71.0 2022-02-12T00:00:00.000Z
13 7 120.0 2022-02-14T00:00:00.000Z
14 7 110.0 2022-02-16T00:00:00.000Z
15 8 89.0 2022-02-19T00:00:00.000Z
16 9 30.0 2022-02-20T00:00:00.000Z
17 9 29.0 2022-02-22T00:00:00.000Z
18 10 100.0 2022-02-22T00:00:00.000Z

I want to know what was the average price and the number of products for sale and the number of sold products on a daily basis between 2022-01-23 and 2022-02-23.

In pseudo SQL it would be something like:

SELECT 
    COUNT(products_for_sale_this_day), 
    COUNT(products_sold_this_day), 
    AVG(price_of_products_for_sale_on_this_day) 
FROM 
    products ... 
WHERE 
    date "is between 2022-01-23 and 2022-02-23" 
GROUP BY 
    "dates in between"`

The result I would expect from the seed data would be:

Products for sale Number of sold Avg price Date
0 0 0.0 2022-01-23
0 0 0.0 2022-01-24
1 0 xx.xx 2022-01-25
2 0 xx.xx 2022-01-26
2 0 xx.xx 2022-01-27
3 0 xx.xx 2022-01-28
3 0 xx.xx 2022-01-29
3 1 92.0 2022-01-30
2 0 xx.xx 2022-01-31
4 1 xx.xx 2022-02-01
3 0 xx.xx 2022-02-02
3 0 xx.xx 2022-02-03
3 0 xx.xx 2022-02-04
3 0 xx.xx 2022-02-05
3 0 xx.xx 2022-02-06
3 0 xx.xx 2022-02-07
3 0 xx.xx 2022-02-08
3 0 xx.xx 2022-02-09
4 0 xx.xx 2022-02-10
4 0 xx.xx 2022-02-11
4 0 xx.xx 2022-02-12
4 1 xx.xx 2022-02-13
4 0 xx.xx 2022-02-14
4 1 xx.xx 2022-02-15
3 0 xx.xx 2022-02-16
3 0 xx.xx 2022-02-17
3 1 xx.xx 2022-02-18
3 0 xx.xx 2022-02-19
4 0 xx.xx 2022-02-20
4 0 xx.xx 2022-02-21
5 1 xx.xx 2022-02-22
4 0 xx.xx 2022-02-23

NOTE: I added xx.xx as I didn't want to manually calculate the AVG for every day in the example. On the 2022-01-30 the average price comes from the following products being for sale with the following prices:

  • Product ID 1, price at 2022-01-30: 100.0
  • Product ID 2, price at 2022-01-30: 85.0
  • Product ID 3, price at 2022-01-30: 91.0

AVG: (100 + 85 + 91) / 3 = 92

https://www.db-fiddle.com/f/jxQAqLvnKExjTVr16XyonG/0

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Niels Kristian
  • 8,661
  • 11
  • 59
  • 117
  • Can you also explain the logic that you using to calculate the ```products for sale``` and ```number of sold```. Based on which date you are calculating these metrics, is it coming from the ```products``` table or the ```product_prices``` table? Please clarify on these. – Teja Goud Kandula Jan 02 '23 at 14:37
  • No, so the number of `products for sale` at a given date is: Any `product` `WHERE created_at <= the_particular_date AND (sold_at >= the_particular_date OR sold_at IS NULL)` as `sold_at IS NULL` means that the product is still for sale (has not been sold). For `number of sold` that would be `WHERE sold_at = the_particular_date` for prices, see my comment on @jian's answer. – Niels Kristian Jan 02 '23 at 16:36

4 Answers4

2
SELECT d.the_day                                        AS "Date"
     , count(p.id)                                      AS "Products for sale"
     , count(p.id) FILTER (WHERE p.sold_at = d.the_day) AS "Number of sold"
     , round(coalesce(avg(pp.price), 0), 2)             AS "Avg price"
FROM  (
   SELECT ts::date AS the_day 
   FROM   generate_series (timestamp '2022-01-23'
                         , timestamp '2022-02-23'
                         , interval  '1 day') ts
   ) d
LEFT  JOIN products p ON p.created_at <= d.the_day
                     AND (p.sold_at < d.the_day) IS NOT TRUE
LEFT  JOIN LATERAL (
   SELECT pp.price
   FROM   product_prices pp
   WHERE  pp.product_id = p.id
   AND    pp.created_at <= d.the_day
   ORDER  BY pp.created_at DESC
   LIMIT  1
   ) pp ON true
GROUP  BY 1
ORDER  BY 1;

fiddle

I only join to products that are already listed and not yet sold to begin with. The day of the sale is included. Notably, products sold "today" are counted among "for sale" as well as among "sold", like your expected result demonstrates.

You show type date, but timestamptz literals. I went with dates like in your fiddle. Data types matter ...

Why did I use generate_series() this way?

About the LATERAL join:

About getting the latest applicable price:

About the aggregate FILTER clause:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1
SELECT
   G.DATE
  , COUNT(DISTINCT(products.id)) FOR_SALES
  , COUNT(DISTINCT( 
        CASE WHEN PRODUCTS.SOLD_AT::DATE=G.DATE 
        THEN products.id ELSE NULL END)) SOLD
  , AVG(PP.PRICE) AVE_PRICE
FROM
generate_series(date '2022-01-23' , date '2022-02-23' , interval '1 day') g (date)
LEFT OUTER JOIN
products  
    ON G.DATE BETWEEN PRODUCTS.CREATED_AT AND COALESCE(PRODUCTS.SOLD_AT,CURRENT_DATE)
INNER JOIN  
   (SELECT price,product_id,created_at price_created_at,coalesce(lead(created_at)over(partition by product_id order by ID asc),current_date)::TIMESTAMP-interval '1 second' to_date FROM product_prices) pp 
  ON pp.product_id = products.id and G.DATE between pp.price_created_at and pp.to_date 
GROUP BY  G.DATE

https://www.db-fiddle.com/f/hBBt3jcVnXXRV5pL1yn1yc/0

enter image description here

Adrian White
  • 1,720
  • 12
  • 14
0

The last table in Original post, last part should be like:

    date    | products_for_sale |
 2022-02-21 |                 4 |              
 2022-02-22 |                 5 |             
 2022-02-23 |                 4 |

Main query:

SELECT
    sub2.*
    , CASE WHEN p1.sold_at IS NOT NULL THEN
        1
    ELSE
        0
    END AS number_of_sold
FROM (
    SELECT
        date
        , count(id) AS products_for_sale
    FROM (
        SELECT
            date::date
            , p.*
        FROM
            generate_series(date '2022-01-23' , date '2022-02-23' , interval '1 day') g (date)
            LEFT JOIN products p ON date >= p.created_at
                AND date <= coalesce(sold_at , '2022-02-23')
            ORDER BY
                id
                , date) sub
    GROUP BY
        1) sub2
    LEFT JOIN products P1 ON p1.sold_at = sub2.date
ORDER BY
    sub2.date;

Make previous query as view (test_2) for simplicity. Then do one more time left join.

SELECT
    sub1.*, price
FROM (
    SELECT
        *
    FROM
        test_2) sub1
    LEFT JOIN ( SELECT DISTINCT ON (1 , 2)
            product_id
            , sold_at AS date
            , pp.created_at AS price_crearted_at
            , price
        FROM
            products p
            JOIN product_prices pp ON pp.product_id = p.id
        WHERE
            sold_at IS NOT NULL
        ORDER BY
            1
            , 2
            , 3 DESC) sub2 USING (date)
ORDER BY
    date;

If you join products with product_prices you will found out that same product have multiple price entry. Since in products, id is unique, meaning every product only sold once, so to get the latest price (order by price_crearted_at desc) of a product in product_prices, use distinct on and with order by price_crearted_at desc

jian
  • 4,119
  • 1
  • 17
  • 32
  • Okay, got it. That part might be unclear: So the point is, the price of a `product` at a given date, is the most recent entry in `product_prices` from that date OR earlier. Because the prices in `product_prices` table are added when a price of product changes (to track price history). So in product ID 1's case, it's the record with ID 1 from `product_prices` which is the "current price" at 2022-01-30. The same is the case for product ID 3. Is it more clear now? – Niels Kristian Jan 02 '23 at 16:33
  • @NielsKristian updated. let me know it works or not. – jian Jan 03 '23 at 04:42
  • Thanks for the response - and you are right about the table - I changed it in the post. The queries on AVG price does not seem to work though. I tried adding it here: https://www.db-fiddle.com/f/jxQAqLvnKExjTVr16XyonG/1 and it seems to be taking the price of the sold ones. What I need is the AVG price of ALL the products for sale that day with their respective price at that day (including those that got sold that day). Am I explaining it well enough? – Niels Kristian Jan 05 '23 at 09:36
  • https://dbfiddle.uk/A-4EOLQw let me know if this is what you want. @NielsKristian – jian Jan 05 '23 at 11:53
  • Thanks for the effort. But unfortunately this is not quite it. :-) – Niels Kristian Jan 18 '23 at 11:18
0
SELECT count(product_prices.product_id) as Productsforsale, count(sold_at) Numberofsold ,
avg(price) Avgprice , date(sold_at) as Date FROM products JOIN product_prices 
ON products.id = product_prices.product_id group by sold_at
order by sold_at

enter image description here

camille
  • 16,432
  • 18
  • 38
  • 60