0

I'm having trouble implementing a JOIN query on these two tables.

Product Table contains products. ProductInstance table contains version of product with pricing that were changed over time.

There should exactly be one row for each storeId even if it's null but should contain highest productInstanceId.

enter image description here

My Query can get rows with largest Ids for each product, but ignores StoreIds.

Select P.id as productId, PI.id as productInstanceId, P.name, PI.storeId, PI.price from products as P

LEFT JOIN productInstances AS PI
ON PI.productId = P.id

LEFT JOIN productInstances AS PI2
ON (PI2.productId = P.id and (PI.id < PI2.id )  )
WHERE PI2.id IS NULL
AND PI.id IS NOT NULL

Create Tables:

CREATE SCHEMA IF NOT EXISTS `TestDB` DEFAULT CHARACTER SET utf8 ;
USE `TestDB` ;

-- -----------------------------------------------------
-- Table `TestDB`.`products`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `TestDB`.`products` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(45) NOT NULL,
  `description` VARCHAR(45) NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `TestDB`.`productInstances`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `TestDB`.`productInstances` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `storeId` INT NULL,
  `productId` INT NOT NULL,
  `price` INT NOT NULL,
  PRIMARY KEY (`id`, `productId`),
  INDEX `fk_productInstances_products_idx` (`productId` ASC),
  CONSTRAINT `fk_productInstances_products`
    FOREIGN KEY (`productId`)
    REFERENCES `TestDB`.`products` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

INSERT TEST DATA USING THIS QUERY

INSERT INTO `TestDB`.`products` (`name`) VALUES ('Product 1');
INSERT INTO `TestDB`.`products` (`name`) VALUES ('Product 2');

INSERT INTO `TestDB`.`productInstances` (`storeId`, `productId`, `price`) VALUES ('1', '1', '111');
INSERT INTO `TestDB`.`productInstances` (`productId`, `price`) VALUES ('1', '122');
INSERT INTO `TestDB`.`productInstances` (`productId`, `price`) VALUES ('1', '133');
INSERT INTO `TestDB`.`productInstances` (`storeId`, `productId`, `price`) VALUES ('1', '1', '115');
INSERT INTO `TestDB`.`productInstances` (`storeId`, `productId`, `price`) VALUES ('2', '1', '155');
forpas
  • 160,666
  • 10
  • 38
  • 76
user606669
  • 1,674
  • 7
  • 27
  • 39
  • 1
    What is your version of MySql? – forpas Aug 03 '20 at 18:33
  • Does this answer your question? [Left Outer Join doesn't return all rows from my left table?](https://stackoverflow.com/questions/4707673/left-outer-join-doesnt-return-all-rows-from-my-left-table) – philipxy Aug 03 '20 at 18:50
  • Please do not [repost questions](https://stackoverflow.com/q/63222513/3404097), edit them, including acting on the comments, including giving a cut & paste & runnable [mre], including saying what is the 1st subexpression that doesn't give what you expect & what you expected & why. – philipxy Aug 03 '20 at 18:51
  • Why without subqueries? – forpas Aug 03 '20 at 19:03
  • @forpas aren't subqueries inefficient as compare to join. – user606669 Aug 03 '20 at 19:28
  • 2
    In this case you have to join a subquery not the table itself. And also no, not always the subqueries are inefficient compared to joins. There is no such general rule. It depends on the requirement and the data. – forpas Aug 03 '20 at 19:33
  • @forpas (corrected) `AND PI.id IS NOT NULL` removes any rows with PI nulls added by `LEFT JOIN productInstances AS PI` to what an `INNER JOIN` would have returned. That is the duplicate issue. The asker does not justify why the given code would do what they want so there's no [mre]. Mimimal wrong code would presumably end after a minimal `WHERE`, where they don't get what they expect. One misconception they have is the duplicate & if they don't think that's their problem or only problem then they can edit their post to justify that. Which involves addressing the input to a WHERE. – philipxy Aug 03 '20 at 21:25
  • @philipxy to clarify what I say. The accepted answer in the proposed duplicate suggests correctly that the conditions of the WHERE clause should be moved to the ON clause: *...The solution is to change WHERE to AND so that the condition is part of the join condition, not a filter applied after the join...* If the OP applies this to the query the problem will not be solved because the double LEFT join is not the correct way to go for this requirement. – forpas Aug 03 '20 at 21:33
  • @forpas Answers at that duplicate & at many other duplicates like the one I posted on the duplicate explain the misconception. – philipxy Aug 03 '20 at 22:29

1 Answers1

1

For MySql 8.0+ you can use ROW_NUMBER() window function to get the max id for each productid and storeid:

select p.id productId, 
       pi.id productInstanceId, 
       p.name, 
       pi.storeId, 
       pi.price 
from products p 
left join (
  select *, row_number() over (partition by storeId, productId order by id desc) rn
  from productInstances
) pi on pi.productId = p.id and pi.rn = 1

For previous versions you can do the same with a correlated subquery:

select p.id as productId, 
       pi.id as productInstanceId, 
       p.name, 
       pi.storeId, 
       pi.price 
from products p 
left join (
  select pi.* from productInstances pi
  where pi.id = (
    select max(id) 
    from productInstances 
    where productId = pi.productId and storeId <=> pi.storeId 
  )
) pi on pi.productId = p.id

See the demo.
Results:

> productId | productInstanceId | name      | storeId | price
> --------: | ----------------: | :-------- | ------: | ----:
>         1 |                 3 | Product 1 |    null |   133
>         1 |                 4 | Product 1 |       1 |   115
>         1 |                 5 | Product 1 |       2 |   155
>         2 |              null | Product 2 |    null |  null
forpas
  • 160,666
  • 10
  • 38
  • 76