0

I have a table that I am trying to find duplicate rows using a Common Table Expression. The fields that I working with are as follows:

LogTime (DataType: datetime2(7),null) ControllerIP (DataType: nvchar(max),null)

I have two rows of data that have the same data in them, as far as I can tell. I did a LEN check on both columns to make sure they are equal lengths as well, yet the rows do not come back as duplicates when using the CTE below. Is there something different I need to do with the LogTime column ? I have never run into this.

WITH CTE AS 
(
SELECT rn = ROW_NUMBER() 
            OVER( 
              PARTITION BY LogTime , ControllerIP
              ORDER BY Id ASC), * 
FROM [DownTime].[dbo].[Records]
) 
SELECT * FROM cte 
WHERE   FileName = '141101.CSV' AND rn > 1 
Order By ID
GO

Also, I am using Microsoft SQL Server 2008R2.

Bill Greer
  • 3,046
  • 9
  • 49
  • 80
  • `LEN` doesn't count trailing spaces (and other non printable chars), so 2 values with the same len and that looks the same, doesn't mean that they have the same data – Lamak Nov 14 '14 at 14:41
  • 1
    Logically, your strategy to find duplicates is sound. Either LogTime or ControllerIP must be different. – Daniel Gimenez Nov 14 '14 at 14:41
  • Okay. I thought it did. I'll google how to do so and go down that path. – Bill Greer Nov 14 '14 at 14:42
  • 1
    Double check the dates, datetime2 has 6 digits of precision – Alex K. Nov 14 '14 at 14:46
  • Maybe this will help: http://stackoverflow.com/questions/1151693/len-vs-datalength-in-sqlserver-2005 – Tab Alleman Nov 14 '14 at 14:47
  • 1
    You should use column names instead of *. I had to look at this query long and hard to figure out how it even worked because the trailing ", *" was really hard to see. – Sean Lange Nov 14 '14 at 14:49

1 Answers1

1

Your plan is sound. If you're not finding duplicates, it's because duplicates don't exist. You can apply some functions to the columns to make finding duplicates more likely, such as trimming spaces from the IP and reducing the precision of the datetime2.

WITH CTE AS (
    SELECT rn = ROW_NUMBER() OVER( 
        PARTITION BY CAST(LogTime AS datetime2(2)), RTRIM(LTRIM(ControllerIP))
        ORDER BY Id ASC), * 
    FROM [DownTime].[dbo].[Records]
) 
SELECT * FROM cte 
WHERE FileName = '141101.CSV' AND rn > 1 
Order By ID
Daniel Gimenez
  • 18,530
  • 3
  • 50
  • 70
  • The way I eventually got this to work is by running my objects through a c# console application. I used the string.trim() method for each object and saved it back to the database. This cleaned up the data and I started seeing the duplicates where I expected. – Bill Greer Nov 19 '14 at 12:13