-1

I have

  1. order table with columns
    • id
    • date
    • supplier_id
  2. order_lineitem table with columns
    • id
    • order_id
    • article_id
    • order_quantity
    • order_price
  3. a prices table with columns
    • id
    • article_id
    • supplier_id
    • valid_until
    • minimum_order_quantity
    • list_price

The prices table doesn't necessarily have to have a matching / valid entry, so this one would have to be joined via an outer join.

I'd like to compare order_prices against list_prices.

Therefore I need to somehow join

SELECT 
   o.id,
   o.date,
   ol.article_id,
   ol.order_quantity,
   ol.order_price,
   p.list_price
FROM 
   `order` o JOIN order_lineitem ol on ol.order_id = o.id
   LEFT OUTER JOIN prices p on 
           p.article_id = ol.article_id
       AND p.supplier_id = o.supplier_id
       AND p.minimum_order_quantity <= ol.order_quantity
       AND IFNULL(p.valid_until, DATE('2099-12-31')) >= o.date
       /* here comes the fun part that doesn't work (reliably) */
       ORDER BY 
          IFNULL(p.valid_until, DATE('2099-12-31')) asc,
          p.minimum_order_quantity desc
       GROUP BY o.id, ol.id, p.article_id
       /* ... trying to get only THAT price from the prices table that applies for the 
          (a) the given article
          (b) from the given supplier
          (c) that was valid at the time of purchase (i.e. has the smallest "valid_until" date that is greater than the purchase date)
          (d) when ordering the given quantity (prices can also increase with higher quantities, so it has to be the price with the largest minimum_order_quantity that is smaller than the ordered quantity)
       */

I particularly don't want to fall into the trap (which I dug for myself here) of using group by to limit the results to 1 record from the prices table based on a previous sorting, since

(i) as per MySQL documentation it is non-deterministic which record will actually get returned (although it may in effect often work and this is a frequently suggested route to go) - also see this excellent explanation on the issue: https://stackoverflow.com/a/14770936/9818188 and

(ii) this concept wouldn't work on other SQL implementations like SQL Server, Maria DB & Co.

The question is not around putting in a nested query in order to be able to ORDER first and then GROUP subsequently. It's more about how to really properly get the correct row--ideally also working on other SQL implementations like SQL Server, Maria DB or Google BigQuery.

And since I can't really rely on prices being cheaper the more I buy I also can't simply get the min(list_price).

How can this can be achieved?

Since the output of this query is required for downstream processing, I can't slice & dice the task but need a full list of all orders with respective list prices.

EDIT Here is a SQL fiddle - the desired prices are shown in column order_price, the prices incorrectly determined by the JOIN (excluding the order byclause - as this would cause non-deterministic results) are shown in column list_price: http://sqlfiddle.com/#!9/f03a4f/2



CREATE TABLE `order`
    (`id` int, `date` datetime, `supplier_id` int)
;
    
INSERT INTO `order`
    (`id`, `date`, `supplier_id`)
VALUES
    (1, '2022-01-15 00:00:00', 1),
    (2, '2022-02-15 00:00:00', 1),
    (3, '2022-03-15 00:00:00', 1),
    (4, '2022-01-15 00:00:00', 2),
    (5, '2022-02-15 00:00:00', 2),
    (6, '2022-03-15 00:00:00', 2)
;


CREATE TABLE order_lineitem
    (`id` int, `order_id` int, `article_id` int, `order_quantity` int, `order_price` int)
;
    
INSERT INTO order_lineitem
    (`id`, `order_id`, `article_id`, `order_quantity`, `order_price`)
VALUES
    (1, 1, 1, 1, 11),
    (2, 1, 1, 10, 8),
    (3, 1, 1, 100, 9),
    (4, 2, 1, 1, 15),
    (5, 2, 1, 10, 12),
    (6, 2, 1, 100, 13),
    (7, 3, 1, 1, 17),
    (8, 3, 1, 10, 14),
    (9, 3, 1, 100, 16),
    (10, 4, 1, 1, 10),
    (11, 4, 1, 10, 80),
    (12, 4, 1, 100, 80),
    (13, 5, 1, 1, 10),
    (14, 5, 1, 10, 80),
    (15, 5, 1, 100, 80),
    (16, 6, 1, 1, 10),
    (17, 6, 1, 10, 10),
    (18, 6, 1, 100, 10)
;


CREATE TABLE prices
    (`id` int, `article_id` int, `supplier_id` int, `valid_until` varchar(10), `minimum_order_quantity` int, `list_price` int)
;
    
INSERT INTO prices
    (`id`, `article_id`, `supplier_id`, `valid_until`, `minimum_order_quantity`, `list_price`)
VALUES
    (1, 1, 1, '2022-01-31', 1, 11),
    (2, 1, 1, '2022-01-31', 10, 8),
    (3, 1, 1, '2022-01-31', 100, 9),
    (4, 1, 2, NULL, 1, 10),
    (5, 1, 1, '2022-02-31', 1, 15),
    (6, 1, 1, '2022-02-31', 10, 12),
    (7, 1, 1, '2022-02-31', 100, 13),
    (8, 1, 1, NULL, 1, 17),
    (9, 1, 1, NULL, 10, 14),
    (10, 1, 1, NULL, 100, 16),
    (11, 2, 1, NULL, 1, 99),
    (12, 1, 2, '2022-02-31', 10, 80)
;

SELECT 
   o.id,
   o.supplier_id,
   o.date,
   ol.article_id,
   ol.order_quantity,
   ol.order_price,
   p.list_price
FROM 
   `order` o JOIN order_lineitem ol on ol.order_id = o.id
   LEFT OUTER JOIN prices p on 
           p.article_id = ol.article_id
       AND p.supplier_id = o.supplier_id
       AND p.minimum_order_quantity <= ol.order_quantity
       AND IFNULL(p.valid_until, DATE('2099-12-31')) >= o.date
       /* here comes the fun part that doesn't work (reliably) */
       /* NOTE: I am purposesly commenting out the ORDER BY clause here, because
          (a) it would have to go after GROUP BY - requiring a nested table which I would like to prevent AND, more importantly,
          (b) limiting the numer of rows returned to 1 by GROUPing with an incomplete set of columns on a sorted table may return non-deterministic results as per the MySQL documentation.
          see also https://stackoverflow.com/a/14770936/9818188 explaining the issue with GROUP BY in this context 
       #
       # ORDER BY 
       #   IFNULL(p.valid_until, DATE('2099-12-31')) asc,
       #   p.minimum_order_quantity desc
       */
       GROUP BY o.id, ol.id, p.article_id
       /* ... trying to get only THAT price from the prices table that applies for the 
          (a) the given article
          (b) from the given supplier
          (c) that was valid at the time of purchase (i.e. has the smallest "valid_until" date that is greater than the purchase date)
          (d) when ordering the given quantity (prices can also increase with higher quantities, so it has to be the price with the largest minimum_order_quantity that is smaller than the ordered quantity)
       */
Fabian K
  • 59
  • 1
  • 7
  • 1
    The `GROUP BY` has to be before the `order BY`but you need to supply a [mre] With tables and data wanted result. – nbk Jul 10 '22 at 23:50
  • @nbk thank you your quick reply. I've added an example. However, as stated, I know that GROUP BY would have to go before the ORDER BY - but that is not at all the issue here. Simply swapping the lines would return incorrect results. AND I am seeking a proper query that isn't affected by non-deterministic results (GROUPing on incomplete set of columns on ordered table) -- see https://stackoverflow.com/a/14770936/9818188 – Fabian K Jul 11 '22 at 07:22
  • 1
    That is a known problem, so wrap the columns that are not in the `Group by` in aggregation functions and it becomes deterministic. Other databases don't allow constructions like yours – nbk Jul 11 '22 at 09:02
  • Hi @nbk thank you again for the quick reply. but unfortunately again not really answering my core question. How do I properly perform the sub-selection of the correct price given the constraints. First question to be answered is: which prices were valid @order. That requires a `MIN` on the `IFNULL(p.valid_until, ...) >= o.date` part. Second, determining the matching price for the quantity, requires a `MAX` on the `p.minimum_order_quantity <= ol.order_quantity` part. And still we are debating about `GROUP BY` which I stated why is not viable as it is non-deterministic. So what would be correct? – Fabian K Jul 11 '22 at 09:24
  • @nbk if I simply do `MIN(IFNULL(p.vali_until, ...))` and `MAX(p.minimum_order_quantity)` I will not necessarily get a valid COMBINATION of the two, but simply the minimum and maximum available out of all candidate solutions. Which would certainly not be correct. So I somehow have to tie those constraints together. – Fabian K Jul 11 '22 at 09:27
  • Seeing you design, You subselect the price table, with all parameters needed from orders add a ROW_NUMBER with the order by you need and exclude all prices that are not valid any more in a where clause, then you join the rownumber is 1 – nbk Jul 11 '22 at 09:43
  • In case of mysql 5.7 you still need an order by but you use LIMIT 1 – nbk Jul 11 '22 at 09:47
  • @nbk unfortunately that's not really what is happening is it.. 1.) How could I introduce a limit 1 within the join? It's not what I am currently doing. 2.) If you run the SQLFiddle you will see that the results are incorrect (compare order_price vs list_price. All should be equal). 3.) I still have multiple list price candidates out of which MySql randomly chooses at the moment. Apologies if I'm just not getting your point yet – Fabian K Jul 11 '22 at 11:39
  • @FabianK I'm a bit confused does something like http://sqlfiddle.com/#!9/f03a4f/4 work? What do you mean by incomplete set of columns if you're only wanting the chosen list_price? – Will B. Jul 13 '22 at 02:05
  • Hi @WillB., thank you so much for your suggestion! Yes, that does return the correct values, however I fear that it is extremely costly as the subquery needs to be evaluated for each and every order line item. I was actually wondering if there's a more performant way of achieving the desired output. With "incomplete set of columns" I mean the issues inflicted by the usage of GROUP on an ordered table while MySQL doesn't have a deterministic algo to return values on non-aggregated columns that are not always the same. See https://stackoverflow.com/a/14770936/9818188 & follow-up answers there – Fabian K Jul 13 '22 at 10:57
  • I see, I am familiar with the limitations with `ONLY_FULL_GROUP_BY` and several workarounds to produce desired results, such as in my answer for [Select one column DISTINCT with corresponding columns](https://stackoverflow.com/a/26124759/1144627). However, I don't see a way to avoid the per line-item evaluation costs, given the criteria of needing the single nearest `list_price` within a range. I'll give it some thought and see if I can come up with something. – Will B. Jul 13 '22 at 13:28
  • @FabianK With the current structure of the data and given criteria, I'm not able to determine a less costly query that produces accurate results, as there is no way to avoid the evaluation without becoming non-deterministic. To ensure compatibility of multiple RDBMs, I suggest refactoring to include the `price.id` reference used in the `order_lineitem` table, running the costly query to get the legacy references, which adheres more to the standard relational schema approach. – Will B. Jul 14 '22 at 16:21

1 Answers1

0

If you are interrestd in the highest listprice, you would do it like the.

If you need also other columns from theprices table, you need to SQL select only rows with max value on a column

as you have to join the sub querys for all articles

SELECT 
   o.id,
   o.date,
   ol.article_id,
   ol.order_quantity,
   ol.order_price,
     (SELECT  `list_price`  FROM prices p WHERE 
              p.article_id = ol.article_id
       AND p.supplier_id = o.supplier_id
       AND p.minimum_order_quantity <= ol.order_quantity
       AND IFNULL(p.valid_until, DATE('2099-12-31')) >= o.date
ORDER BY `list_price` DESC
LIMIT 1
   ) list_price
FROM 
   `order` o JOIN order_lineitem ol on ol.order_id = o.id
nbk
  • 45,398
  • 8
  • 30
  • 47
  • Hi @nbk thank you for your suggestion, but as I stated, I am not interested in the lowest or highest price .. I am precisely looking for the price that satisfies the outlined conditions. So this solution would unfortunately return incorrect results. You can safely assume that there is only one price valid for each combination of `p.valid_until`, `p.minimum_order_quantity`, `p.article_id` and `p.supplier_id`. However, getting exactly that one is the challenge here. Thank you ! – Fabian K Jul 11 '22 at 15:41
  • i think you will only get one price per article only if there are multiple Rows that fitt the criteria, you will get the highest price. The conditions of your join are completely. Of course3 you must test it further, but when you got only one fitting price, you will get it as i said very clearly – nbk Jul 11 '22 at 16:03
  • Hi @nbk there are typically several generally matching rows. As you don't know (a) since when a price is valid (only _until_ when) AND (b) you don't know up to what quantity a price is valid (only _from_ which quantity) you typically get a series of prices. As I stated it is not necessarily the more you buy the cheaper it gets. It is also not the "younger" the price the cheaper. So there is no way to aggregate on the price column - I really need a solution for solving the question (which is the LATEST valid price where o.date <= p.valid_until AND the SMALLEST MOQ >= ol.order_quantity – Fabian K Jul 12 '22 at 19:56