7

DISCLAIMER: This is not a general question about the problems of NOLOCK (and therefore not a duplicate of Is the NOLOCK (Sql Server hint) bad practice?); it is a specific question about how NOLOCK and DISTINCT interact in an attempt to better understand SQL Server's inner workings.

As strange as it may seem, it appears to me that NOLOCK may be causing DISTINCT to fail in a certain case. Here is the example:

INSERT INTO TableA (ID)
SELECT DISTINCT ID
FROM TableB WITH (NOLOCK)

The above example occasionally produces a PK violation. Here are the other relevant facts:

  • PK of TableA is ID
  • PK of TableB is ID
  • TableA is empty when this starts.
  • Nothing else is writing to TableA during this time.
  • There are updates happening to TableB while the above is running.

My working theory is that 1) the updates on TableB combined with the use of NOLOCK are causing duplicate data, and 2) the optimizer is relying on the fact that TableA has a PK on the same column that we are DISTINCTing, and so doesn't actively perform a DISTINCT operation on the rows that are being returned, it just assumes the rows will already be distinct.

Can anyone confirm this? And if so, is this by design, or a bug in SQL Server?

I originally thought even with dirty reads and the possibility of duplicate rows that DISTINCT would be a guarantee to clean up the duplicates, but the evidence I'm seeing seems to indicate otherwise.

This error was seen on SQL Server 2008R2.

JohnnyM
  • 28,758
  • 10
  • 38
  • 37
  • nolock can cause a host of problems, it is really best to avoid it whenever possible. Side affects can include duplicate records and missing records. What you experience depends on mutations that are currently taking place on the object being read from. See also https://stackoverflow.com/q/1452996/1260204 – Igor Oct 19 '17 at 17:21
  • Possible duplicate of [Is the NOLOCK (Sql Server hint) bad practice?](https://stackoverflow.com/questions/1452996/is-the-nolock-sql-server-hint-bad-practice) – Igor Oct 19 '17 at 17:22
  • You might get await with NOLOCK hints for non-critical reads, however, this is an example of an update/insert command that you would want to avoid using a NOLOCK with. https://stackoverflow.com/questions/210171/effect-of-nolock-hint-in-select-statements – Ross Bush Oct 19 '17 at 17:26
  • I understand the general problems with NOLOCK (data accuracy, possible duplicates, possible missed rows), but this question is specifically about the possibility of NOLOCK causing the DISTINCT command to not do its job. I'm hoping someone knows the answer to this. That is, I'm not just seeking how to make this code work, but I want to understand what SQL Server is doing in this case and why. Having a DISTINCT fail kind of rocks my world as far as what I thought SQL Server was supposed to guarantee. – JohnnyM Oct 19 '17 at 18:03
  • 1
    While I wholeheartedly agree that NOLOCK is a bad practice I do not think this is a duplicate. The OP states they understand nolock and wants to know why duplicates can appear using that hint. – Sean Lange Oct 19 '17 at 18:27
  • Not an answer to your question, but I wonder if this would work around this behavior if you insist on still using NOLOCK (why not RCSI?): INSERT INTO TableA (ID) SELECT DISTINCT ID FROM ( SELECT DISTINCT ID FROM TableB WITH (NOLOCK) ) t – Tara Kizer Oct 19 '17 at 20:11

1 Answers1

5

Sure this can happen. The engine is smart enough to know that since ID is your primary key it isn't going to waste resources looking for duplicates. However, you have introduced the dreaded NOLOCK hint. And you said that TableB is being updated during this process.

What you are almost certainly experiencing here is one the side affects of NOLOCK brought on by page splits. These page splits can cause the engine to return duplicate rows and as I said before the engine assumes you have no duplicates because you are selecting the primary key and there can't be duplicates. This is NOT a bug in sql server, it is yet another reason so stop using the hint.

Sean Lange
  • 33,028
  • 3
  • 25
  • 40
  • I think you are right, but do you know of any documentation to this effect? – JohnnyM Oct 19 '17 at 19:59
  • Actually I found a forum over at SSC with much the same. And Gail Shaw just stated much the same as I did. https://www.sqlservercentral.com/Forums/Topic1234650-391-1.aspx – Sean Lange Oct 19 '17 at 20:03
  • 2
    Great link, thanks! When Gail Shaw and Paul White chime in on something, that pretty much seals it. I do wish there was some documentation from Microsoft in this regard, or even better in my mind, a carefully crafted, perfectly illustrated Paul White article. :) – JohnnyM Oct 19 '17 at 20:56
  • I knew I had read about this in the past and was surprised. Glad I could scare up a link from some reputable people on the topic. – Sean Lange Oct 19 '17 at 21:05