4

I get a different result set for this query intermittently when I run it...sometimes it gives 1363, sometimes 1365 and sometimes 1366 results. The data doesn't change. What could be causing this and is there a way to prevent it? Query looks something like this:

SELECT * 
FROM 
    (
            SELECT  
                        RC.UserGroupId,
                        RC.UserGroup,
                        RC.ClientId AS CLID,                     
                        CASE WHEN T1.MultipleClients = 1 THEN RC.Salutation1 ELSE RC.DisplayName1 END AS szDisplayName,
                        T1.MultipleClients,
                        RC.IsPrimaryRecord,
                        RC.RecordTypeId,
                        RC.ClientTypeId,
                        RC.ClientType,
                        RC.IsDeleted,
                        RC.IsCompany,                                            
                        RC.KnownAs,
                        RC.Salutation1,
                        RC.FirstName, 
                        RC.Surname,                   
                        Relationship, 
                        C.DisplayName Client,
                        RC.DisplayName RelatedClient, 
                        E.Email,                                                            
                        RC.DisplayName  + ' is the ' + R.Relationship + ' of ' + C.DisplayName Description,
                        ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id


            FROM 
                        SSDS.Client.ClientExtended C 
                                                                              INNER JOIN 
                        SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID 
                                                                              INNER JOIN 
                        SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
                                                                              LEFT OUTER JOIN
                        SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId                 
                                                                              LEFT OUTER JOIN 
                        SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId 
                                                                              INNER JOIN                                                                 

                        (
                              SELECT 
                                          E.Email, 
                                          CASE WHEN (COUNT(DISTINCT RC.DisplayName) > 1) THEN 1 ELSE 0 END AS MultipleClients 

                              FROM
                                          SSDS.Client.ClientExtended C 
                                                                                                INNER JOIN 
                                          SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID 
                                                                                                INNER JOIN 
                                          SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
                                                                                                LEFT OUTER JOIN
                                          SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId                 
                                                                                                LEFT OUTER JOIN 
                                          SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId

                              WHERE 
                                          Relationship IN ('z-Group Principle', 'z-Group Member ')           
                                          AND E.Email IS NOT NULL

                              GROUP BY E.Email 

                        ) T1 ON E.Email = T1.Email


            WHERE 

                                                Relationship IN ('z-Group Principle', 'z-Group Member ')           
                                                AND E.Email IS NOT NULL                                         
    ) T


WHERE       
            sequence_id = 1
            AND T.UserGroupId IN (Select * from iCentral.dbo.GetSubUserGroups('471b9cbd-2312-4a8a-bb20-35ea53d30340',0))         
            AND T.IsDeleted = 0           
            AND T.RecordTypeId = 1 
            AND T.ClientTypeId IN
            (
                        '1',              --Client
                        '-1652203805'    --NTU                      
            )        

        AND T.CLID NOT IN
          (
            SELECT DISTINCT  
                               UDDF.CLID
            FROM
                   SLacsis_SLM.dbo.T_UserDef UD WITH (NOLOCK) 
                          INNER JOIN
                   SLacsis_SLM.dbo.T_UserDefData UDDF WITH (NOLOCK)
                   ON UD.UserDef_ID = UDDF.UserDef_ID
                          INNER JOIN
                   SLacsis_SLM.dbo.T_Client CLL WITH (NOLOCK)
                   ON CLL.CLID = UDDF.CLID AND CLL.UserGroup_CLID = UD.UserID

            WHERE 
                           UD.UserDef_ID in    
                                    (
                                    'F68F31CE-525B-4455-9D50-6DA77C66FEE5',                                    
                                    'A7CECB03-866C-4F1F-9E1A-CEB09474FE47'  
                                    )

                           AND UDDF.Data = 'NO'
           ) 


ORDER BY T.Surname

EDIT:

I have removed all NOLOCK's (including the ones in views and UDFs) and I'm still having the same issue. I get the same results every time for the nested select (T) and if I put the result set of T into a temp table in the beginning of the query and join onto the temp table instead of the nested select then the final result set is the same every time I run the query.

EDIT2:

I have been doing some more reading on ROW_NUMBER()...I'm partitioning by email (of which there are duplicates) and ordering by Relationship (where there are only 1 of 2 relationships). Could this cause the query to be non-deterministic and would there be a way to fix that?

EDIT3:

Here are the actual execution plans if anyone is interested http://www.mediafire.com/?qo5gkh5dftxf0ml. Is it possible to see that it is running as read committed from the execution plan? I've compared the files using WinMerge and the only differences seem to be the counts (ActualRows="").

EDIT4:

This works:

SELECT * FROM 
(
   SELECT *, 
             ROW_NUMBER() OVER (PARTITION BY B.Email ORDER BY Relationship DESC) AS sequence_id
             FROM
            (

                                    SELECT DISTINCT  
                                    RC.UserGroupId,
                                    ...
            ) B...

EDIT5:

When running the same ROW_NUMBER() query (T in the original question, just selecting RC.DisplayName and ROW_NUMBER) twice in a row I get different rank for some people:

enter image description here

Does anyone have a good explanation/example of why or how ROW_NUMBER() over a result set that contains duplicates can rank differently each time it is run and ultimately change the number of results?

EDIT6:

Ok I think this makes sense to me now. This occurs when people 2 people have the same email address (e.g a husband and wife pair) and relationship. I guess in this case their ROW_NUMBER() ranking is arbitrary and can change every time it is run.

woggles
  • 7,444
  • 12
  • 70
  • 130
  • 2
    What is the isolation level this runs under? – Oded Aug 19 '11 at 14:28
  • Do you have any non-deterministic functions in your code? I've scanned this code, but you say it 'looks something like this', so assumed it wasn't exact. – p.campbell Aug 19 '11 at 14:29
  • @p.campbell: The SQL is the basically the same, I just removed some additional filtering conditions in the where in clause – woggles Aug 19 '11 at 14:31
  • 3
    @woggles: you override this with NOLOCK which is the same as READ UNCOMMITTED – gbn Aug 19 '11 at 14:33
  • Can you identify the 2 or 3 rows that sometimes show up and sometimes don't? You can probably copy the output to Excel, sort, and visually see where the matching stops (maybe just SELECT a key instead of SELECT *). – Aaron Bertrand Aug 19 '11 at 15:30
  • @Aaron, yep - I've identified the rows that don't appear. The script isn't just going to be run once off so I'll have to find a way to get it to work properly – woggles Aug 19 '11 at 16:05
  • `ROW_NUMBER` is indeed non deterministic if you have any duplicates for a ` E.Email, Relationship` combination. No idea whether that could cause your issue though. If this variability of results is easy to reproduce suggest you capture some actual execution plans and then when you have a discrepancy look at the actual number of rows to see where it becomes different. You can do a diff on the XML. – Martin Smith Aug 23 '11 at 22:57
  • The execution plans show that there are 31 vs 30 rows coming out of the `JOIN` on `INNER JOIN SLacsis_SLM.dbo.T_Client CLL WITH (NOLOCK) ON CLL.CLID = UDDF.CLID AND CLL.UserGroup_CLID = UD.UserID`. Why are you still using `NOLOCK` when the comments and answer make it abundantly clear that this can cause the issue? -1 for wasting my time. – Martin Smith Aug 24 '11 at 10:18
  • @Martin, apologies - I reran the wrong query that still had the WITH (NOLOCK) ON in it. It still happens with no NOLOCKs. I have reuploaded the file to http://www.mediafire.com/?qo5gkh5dftxf0ml. 1st run returns 1381 results, 2nd one 1383 – woggles Aug 24 '11 at 11:27
  • @woggles - downvote removed. The clustered index seek on `T_UserDefData` seems to be where the discrepancy begins this time (609 vs 605 rows). Are you absolutely positive this table is not encountering any modification activity? – Martin Smith Aug 24 '11 at 11:45
  • Try denserank not rownumber which will deal with ties... – gbn Aug 24 '11 at 12:25
  • @Martin, checked in an isolated environment where the data isn't changing and I still have the same issue...I'm pretty sure it has something to do with `ROW_NUMBER` and duplicate rows. The differing number of rows doesn't happen if I nest the main select (T) as SELECT DISTINCT and do ROW_NUMBER over the distinct set (i'll update the question in a sec to show what I mean) – woggles Aug 24 '11 at 12:32

6 Answers6

10

Your use of NOLOCK all over means you are doing dirty reads and will see uncommitted data, data that will be rolled back, transient and inconsistent data etc

Take these off, try again, report back pleas

Edit: some options with NOLOCKS removed

  1. Data is really changing
  2. Some parameter or filter is changing (eg GETDATE)
  3. Some float comparisons running on different cores each time
    See this on dba.se https://dba.stackexchange.com/q/4810/630
  4. Embedded NOLOCKs in udfs or views (eg iCentral.dbo.GetSubUserGroups)
  5. ...
Community
  • 1
  • 1
gbn
  • 422,506
  • 82
  • 585
  • 676
  • I'm looking for the link, but there is also a post Remus made a few weeks back linking to a test that shows results can change even when the tables don't using `NOLOCK` – JNK Aug 19 '11 at 14:31
  • @JNK: yep, seen demos, you can still see nonrepeatable and phantom reads http://msdn.microsoft.com/en-us/library/ms189122.aspx – gbn Aug 19 '11 at 14:33
  • @gbn: removed all the the WITH (NO LOCK), but I'm still having the same issue. basically a different result set every time i run the query – woggles Aug 19 '11 at 14:37
  • 1
    @woggles - did you remove them ALL, including the subqueries? – JNK Aug 19 '11 at 14:39
  • yep - ill double check that none of the views that are queried use NOLOCK either: will follow up in a sec – woggles Aug 19 '11 at 14:40
  • I've double checked all views (ClientExtended, ClientRelationship, Email, UserDefinedData) and replaced the UDF with its result set but still no luck...thnks for all the help, will keep hunting – woggles Aug 19 '11 at 15:02
4

As I said yesterday in the comments the row numbering for rows with duplicate E.Email, Relationship values will be arbitrary.

To make it deterministic you would need to do PARTITION BY B.Email ORDER BY Relationship DESC, SomeUniqueColumn . Interesting that it changes between runs though using the same execution plan. I assume this is a consequence of the hash join.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
4

I think your problem is the first row over the partition is not deterministic. I suspect that Email and Relationship is not unique.

    ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id 

Later you examine the first row of the partition.

    WHERE   T.sequence_id = 1
        AND T.UserGroupId ... 

If that first row is arbitrary then you are going to get an arbitrary where comparison. You need to add to the ORDER BY to include a complete unique key. If there is no unique key then you need to make one or live with arbitrary results. Even on a table with a clustered PK the select row order is not guaranteed unless the entire PK is in the sort clause.

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • Two other users have the same answer. When I started working on this those answers were not yet posted. – paparazzo Aug 24 '11 at 14:56
3

This probably has to do with ordering. You have a sequence_id defined as a row_number ordered by Relationship. You'll always get a sensible order by relationship, but other than that your row_number will be random. So you can get different rows with sequence_id 1 each time. That in turn will affect your where clause, and you can get different numbers of results. To fix this to get a consistent result, add another field to your row_number's order by. Use a primary key to be certain of consistent results.

user12861
  • 2,358
  • 4
  • 23
  • 41
2

There's a recent KB that addresses problems with ROW_NUMBER() ... see FIX: You receive an incorrect result when you run a query that uses the row_number function in SQL Server 2008 for the details.

However this KB indicates that it's a problem when parallelism is invoked for execution, and looking at your execution plans I can't see this kicking in. But the fact that MS have found a problem with it in one situation makes me a little bit wary - i.e., could the same issue occur for a sufficiently complicated query (and your execution plan does look sufficiently large).

So it may be worth checking your patch levels of SQL Server 2008.

Chris J
  • 30,688
  • 6
  • 69
  • 111
-1

U Must only use

Order by

without prtition by.

ROW_NUMBER() OVER (ORDER BY Relationship DESC) AS sequence_id 
bsiamionau
  • 8,099
  • 4
  • 46
  • 73
mohsen
  • 1