1

I have a database "inventory" with food items with the following columns:

index | date | category | subcategory | code | description | start | end

where the "date" is the date received into inventory. I want to select the record for each unique category and subcategory where "start" is null and the date is the oldest.

My initial attempt looked like this:

SELECT MIN(date) as date, category, subcategory, description, code, inventory.index
    FROM inventory 
    WHERE start is null
    GROUP BY category, subcategory

and it returned the record that I wanted for each unique category and subcategory, but with a few exceptions. If a record was added later which had an earlier date (so, it has a higher index than records with later dates), the query would correctly return the date from this record, but would return values from another record for the remaining columns, creating a hybrid row that doesn't exist in the table. It's like the "date" value is continuously checked and updated if necessary, but the other fields are not changed once values are loaded into the result. I'm hoping that someone can help me understand what's happening here.

I tried a few other approaches:

How to select data where a field has a min value in MySQL?

John Woo suggested using a nested SELECT clause inside the WHERE clause. So, mine looks like:

SELECT * 
                    FROM inventory
                    WHERE 
                    ( SELECT MIN(date) FROM inventory )
                    AND
                    start is null
                    GROUP BY category, subcategory

There seems to be a problem with the AND operator, and this only evaluates the "start is null" condition, and ignores the other regardless of which order I put them in. When I remove one of the conditions, the other evaluates correctly.

SQL query to select distinct row with minimum value

Ken Clark suggested using an INNER JOIN, which I modified to:

SELECT inv.*
FROM inventory inv
  INNER JOIN
  (
    SELECT MIN(date) AS date, category, subcategory, description, code, inventory.index
    FROM inventory
    WHERE start is null
    GROUP BY category, subcategory
  ) inv2
  ON inv2.index = inv.index
WHERE inv2.date = inv.date

This approach returns the same results as my initial attempt, except that the problematic rows are omitted (I guess because the dates are different when it comes time to evaluate the final WHERE clause).

I would appreciate any and all suggestions on where I can learn about what's happening in these cases. Thank you!

GMB
  • 216,147
  • 25
  • 84
  • 135
  • 2
    Update your post with sample data and expected output, to improve understanding of your post and allow debugging of solutions. – lemon May 23 '23 at 11:49
  • 1
    It seems likely that at least some of your issues are to do with imlicit grouping - columns which aren't part of either an aggregate expression, or part of the `GROUP BY` clause – Rob Eyre May 23 '23 at 11:50
  • As @lemon says, we need data to better understand your question. However, the issue may be that most DBMSs require that where there's a GROUP BY then all columns in the SELECT must either be in the GROUP by or aggregated. MySQL is one of the few (the only?) that doesn't require this and, instead, returns random values from the rows within each GROUP BY subset of data for the columns not being aggregated or in the GROUP BY – NickW May 23 '23 at 11:51

2 Answers2

1
SELECT MIN(date) as date, category, subcategory, description, code, inventory.index
FROM inventory 
WHERE start is null
GROUP BY category, subcategory

None of the three queries (initial attempt, then using a subquery in the WHERE clause or a JOIN) is valid standard SQL, because the columns in the GROUP BY and SELECT clauses are not consistent; in MySQL, they can only run with mode ONLY_FULL_GROUP_BY disabled - which is not good practice.

If we were to solve it with a subquery, we would correlate it with the outer query like so:

select * 
from inventory i
where i.index = (
    select i1.index 
    from inventory i1 
    where i1.category = i.category and i1.subcategory = i.subcategory and i1.start is null 
    order by i1.date, i1.index limit 1
)

The subquery brings the index of the row that you are looking for for the current category / subcategory tuple, so that the outer query can properly filter in the corresponding row.

If you are running MySQL 8.0 (or MariaDB >= 10.2), I would recommend window functions instead (so that the query can be executed in a single table scan):

select *
from (
    select i.*, 
        row_number() over(partition by category, subcategory order by i.date, i.index) rn
    from inventory i
    where start is null
) t
where rn = 1
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thank you GMB - your subquery solution also worked! I follow your approach, though I'll have to educate myself on what you were saying about the statements not being valid SQL. Is there a scenario in which Rob Eyre's answer would not return what I'm looking for? I also tried your window function suggestion, and got a syntax error: "#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '(partition by category, subcategory order by i.date, i.index) rn from inven' at line 4." – Jefferson Paul Jones May 23 '23 at 12:56
  • @JeffersonPaulJones: Rob Eyre's answer has the same problem as your queries as regard to `GROUP BY` (in the sense that it is not valid standard SQL), so it is not guaranteed to always return the correct result. As for window functions: you need MariaDB 10.2 or higher for this to work. – GMB May 23 '23 at 13:04
0

You can use a string concatenation trick to get the correlated column index for an aggregated function MIN(date), grouped by your category and subcategory. Once you have these index values of the selected records, you can just join to get the full row:

  SELECT inv.*
  FROM inventory inv
  INNER JOIN
  (
    SELECT SUBSTRING(MIN(CONCAT(`date`, `index`)), 11) AS "selectedIndex", `category`, `subcategory`
    FROM `inventory`
    WHERE `start` IS NULL
    GROUP BY `category`, `subcategory`
  ) inv2
  ON inv.`index` = inv2.`selectedIndex`

(Presuming your date column is a standard MySQL date)

Rob Eyre
  • 717
  • 7