This is an odd question to be sure, but I'm a bit flummoxed for an explanation for this behavior:
Background: (not required to know)
So to start, I was writing a quick query and was pasting a list of UNIQUERIDENTIFIER
and wanted them to be uniform inside of a WHERE X IN (...)
clause. In the past I used an empty UNIQUERIDENTIFIER
(all zeros) at the top of the list so that I could paste a uniform set of UNIQUERIDENTIFIER
that look like: ,'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'
. This time, to avoid tapping the zeros, I inserted a NEWID()
thinking that the odds of a collision were nearly impossible, to my surprise that resulted thousands of additional results, like 50+% of the table.
Begin question: (part you do need to know)
This query:
-- SETUP: (i boiled this down to the bare minimum)
-- just creating a table with 500 PK UNIQUERIDENTIFIERs
IF (OBJECT_ID('tempdb..#wtfTable') IS NOT NULL) DROP TABLE #wtfTable;
CREATE TABLE #wtfTable (WtfId UNIQUEIDENTIFIER PRIMARY KEY);
INSERT INTO #wtfTable
SELECT TOP(500) NEWID()
FROM master.sys.all_objects o1 (NOLOCK)
CROSS JOIN master.sys.all_objects o2 (NOLOCK);
-- ACTUAL QUERY:
SELECT *
FROM #wtfTable
WHERE [WtfId] IN ('00000000-0000-0000-0000-000000000000', NEWID());
... should statistically produce bupkis. But if you run it ten times or so you will sometimes get massive selections. For instance, in this last run I received 465/500 rows, meaning over 93% of the rows were returned.
While I understand the NEWID()
will be recalculated on a per row basis, there's not a statistical chance in hell that it hits that much. Everything I wrote here is required to produce the nuanced SELECT
, removal of anything will prevent it from happening. Incidentally, you can replace the IN
with a WHERE WtfId = '...' OR WtfId = NEWID()
and still receive the same results. I am using SQL SERVER 2014 Standard patched up to date, with no odd settings activated that I know of.
So does anyone out there know what's up with this? Thanks in advance.
Edit:
The '00000000-0000-0000-0000-000000000000'
is a red herring, here's a version that works with integers: (interesting note, I needed to raise the table size to 1000 with integers to produce the problematic query plan...)
IF (OBJECT_ID('tempdb..#wtfTable') IS NOT NULL) DROP TABLE #wtfTable;
CREATE TABLE #wtfTable (WtfId INT PRIMARY KEY);
INSERT INTO #wtfTable
SELECT DISTINCT TOP(1000) CAST(CAST('0x' + LEFT(NEWID(), 8) AS VARBINARY) AS INT)
FROM sys.tables o1 (NOLOCK)
CROSS JOIN sys.tables o2 (NOLOCK);
SELECT *
FROM #wtfTable
WHERE [WtfId] IN (0, CAST(CAST('0x' + LEFT(NEWID(), 8) AS VARBINARY) AS INT));
or you could just replace the literal UNIQUEIDENTIFIER
and do this:
DECLARE @someId UNIQUEIDENTIFIER = NEWID();
SELECT *
FROM #wtfTable
WHERE [WtfId] IN (@someId, NEWID());
both produce the same results... the question is Why does this happen?