Select rows with missing value for each item

Question

I'm trying to make a report to find rows in a table, which have a mistake, a missing item order. I.e.

ID   Item  Order
----------------
 1     A       1
 2     A       2
 3     A       3
 4     B       1
 5     B       2
 6     C       2
 7     C       3
 8     D       1

Note, that Item "C" is missing row with Order index "1". I need to find all items, which are missing index "1" and start with "2" or other. One way I figured is this:

SELECT DIstinct(Item) FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1)

But surprisingly (to me), it does not give me any results even though I know I have such items. I guess, it first selects items wich are not in sub-select and then distincts them, but what I wanted to is select distinct Items and find which of them have no lines with "Order = 1".

Also, this code is to be executed over some 70 thousands of lines, so it has to be feasible (another way I can think of is a CURSOR, but that would be very slow and possibly unstable?).

Regards,

Oak

DISTINCT is not a function (on a column), it's a part of SELECT DISTINT - and works on the whole select rows. "select distinct(a), b" is the same as "select distinct a, b" which is the same as "select distinct a, (b)" etc. — jarlh, Dec 01 '15 at 13:45
Yes, I agree, however I'm not sure how it is related to the code above? It is related to result set row and in this case it only contains one column. — Oak_3260548, Dec 01 '15 at 17:16
The column you select is the whole row, so in your specific case it doesn't matter. But it's a very bad habit to have parentheses around the column name. Don't confuse yourself or anybody else, just do SELECT DISTINCT Item FROM etc. — jarlh, Dec 02 '15 at 07:37

score 3 · Answer 1 · answered Dec 01 '15 at 13:38

3

The idea is sound, but there is one tiny detail with NOT IN that may be problematic. That is, if the subquery after NOT IN results in any NULLs, the NOT IN is evaluated as if it were false. This may be the reason why you get no results. You can try NOT EXISTS, like in the other answer, or just

SELECT DISTINCT Item FROM ITEMS as I WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1 AND Item IS NOT NULL)

answered Dec 01 '15 at 13:38

marmarta

838
5
20

No, actually it does something. Because if subquery after NOT IN returns any `NULLs` the NOT IN is evaluated as false - see for example http://stackoverflow.com/questions/129077/not-in-clause-and-null-values – marmarta Dec 01 '15 at 13:41
Yes, indeed I am. Because if there is a single `NULL` item, the `(SELECT Item FROM Items WHERE Order = 1)` will return as one of its values `NULL` - and this will cause the entire query to return no values. Try it: `SELECT 1 WHERE 1 NOT IN (2,3)` returns a single row, while `SELECT 1 WHERE 1 IN (2,3,NULL)` returns zero rows. – marmarta Dec 01 '15 at 13:50
Good point, forgotten because i'm almost only use `NOT EXISTS` for such reasons +1 – Tim Schmelter Dec 01 '15 at 14:16
This answer is identical to anotherone bellow and is correct. Thank you Marta! – Oak_3260548 Dec 02 '15 at 03:39

Tim Schmelter · Answer 2 · 2015-12-01T14:24:31.667

2

You can use NOT EXISTS:

SELECT DISTINCT(i1.Item) FROM ITEMS i1
WHERE NOT EXISTS
(
    SELECT 1 FROM Items i2 
    WHERE i1.Item = i2.Item AND i2.[Order] = 1
)

NOT IN has it's issues, worth reading:

http://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join

The main problem is that the results can be surprising if the target column is NULLable (SQL Server processes this as a left anti semi join, but can't reliably tell you if a NULL on the right side is equal to – or not equal to – the reference on the left side). Also, optimization can behave differently if the column is NULLable, even if it doesn't actually contain any NULL values

because of this...

Instead of NOT IN, use a correlated NOT EXISTS for this query pattern. Always. Other methods may rival it in terms of performance, when all other variables are the same, but all of the other methods introduce either performance problems or other challenges.

edited Dec 01 '15 at 14:24

answered Dec 01 '15 at 13:32

Tim Schmelter

450,073
74
686
939

what if 1 exists but lets say 2 or other number is missing , how to do that ? as he said `start with "2" or other` this will not work – Moudiz Dec 01 '15 at 13:47
@Moudiz: he said _"missing index "1" and start with "2" or other"_. That just meant that this item-group contains no one with order=1 but only with any other number. That's exactly what the `NOT EXISTS` checks – Tim Schmelter Dec 01 '15 at 13:59
This is correct, any other missing number is not a problem, I only need items with "1" missing. But I tested the above query, it gives a sound result set of 68 items, but it does not work properly. It does returns items with "1"s. I'm not sure what we actually did with the above query. – Oak_3260548 Dec 01 '15 at 17:14
@user: that cannot be, maybe your original query is not the same – Tim Schmelter Dec 01 '15 at 18:10
I'm not sure why but it returns all sorts of Items: Ones with full Order sequense (i.e. 1,2,3), ones with only "1", ones which have NULL sequence. But not what I needed. I won't test it anymore, as I have two tested working solutions as marked bellow. Thank you anyway, Tim. – Oak_3260548 Dec 02 '15 at 03:36

score 2 · Accepted Answer · edited Jan 18 '21 at 12:30

You can find the missing orders using a HAVING clause. HAVING allows you to filter on aggregated records. In this case we are filtering for Items with a min Order in excess of 1.

The benefit of this approach over a sub query in the WHERE clause is SQL Server doesn't have to rerun the sub query multiple times. It should run faster on large datasets.

Example

/* HAVING allows us to filter on aggregated records. 
 */
WITH SampleData AS
    (
        /* This CTE creates some sample records 
         * to experiment with.
         */
        SELECT
            r.*
        FROM
            (
                VALUES
                    ( 1,     'A',       1),
                    ( 2,     'A',       2),
                    ( 3,     'A',       3),
                    ( 4,     'B',       1),
                    ( 5,     'B',       2),
                    ( 6,     'C',       2),
                    ( 7,     'C',       3),
                    ( 8,     'D',       1)
            ) AS r(ID, Item, [Order])
    )
SELECT
    Item,
    COUNT([Order])        AS Count_Order,
    MIN([Order])        AS Min_Order
FROM
    SampleData
GROUP BY
    Item
HAVING 
    MIN([Order]) > 1
;

Like other solutions this one can't find other breaks in the order sequence. If you consider 1, 2, 4 an error, because of the missing 3, you will need another approach. Research tally tables for more on this... — David Rushton, Dec 01 '15 at 13:44
Your comment is correct, but my question was really about first order sequence missing. Only "leading" order missing is a problem. — Oak_3260548, Dec 02 '15 at 03:04
Tested, this is a working solution. Upvoted and marked as answer! Thank you Release Object! — Oak_3260548, Dec 02 '15 at 03:26

score 2 · Answer 4 · answered Dec 01 '15 at 13:44

2

Your query should work. The problem is probably that Item could be NULL. So try this:

SELECT Distinct(Item)
FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1 AND Item IS NOT NULL);

This is why NOT EXISTS is preferable to NOT IN.

I would do this, though, with an aggregation query:

select item
from items
group by item
having sum(case when [order] = 1 then 1 else 0 end) = 0;

answered Dec 01 '15 at 13:44

Gordon Linoff

1,242,037
58
646
786

This answer should be marked as answer, because it works. Really, the only issue was " AND Item IS NOT NULL". I marked another answer as answer already, but this is equal in terms of result set! Thank you Gordon! – Oak_3260548 Dec 02 '15 at 03:33

Select rows with missing value for each item

4 Answers4