0

I've the following query:

INSERT INTO [Table A]  ([student_name], [class_id],[contact_detail], [birth_date],[note_average])
            SELECT [student_name] = case when CHARINDEX('.', [student_name])>0 then LEFT([student_name],CHARINDEX('.', [student_name])-1)
                        else [student_name] end
                        ,[class_id]
                        ,case when reverse(SUBSTRING(REVERSE([contact_detail]),1,CHARINDEX(':', REVERSE([contact_detail])))) like ':' 
                                        then ([contact_detail] + '|')
                                        else    [contact_detail]
                                        end as [contact_detail]
                         ,[birth_date]
                         ,CAST([note_average] AS decimal(13,2)) as [note_average]
                         ,GETDATE()
            FROM [Table A] 
            WHERE CAST([birth_date] AS DATE) <= CAST(GETDATE() AS DATE)
                    AND LEN([student_name]) >= 5
                    AND NOT EXISTS
                            (
                      SELECT [student_name]
                                              ,[class_id]
                                              ,[contact_detail]
                                              ,[birth_date]
                      FROM [Table A]  a
                      WHERE  '%' + ods.[student_name] + '%' LIKE a.[student_name]
                                AND '%' +  ods.[class_id] + '%'  LIKE a.[class_id]
                                AND '%' + ods.[contact_detail] + '%' LIKE a.[contact_detail]
                                AND ods.[birth_date] = a.[birth_date]
                            )
GO

I don't want to insert duplicate values and I don't have an key in my table. My problem is: this query is taking a lot of time to insert the new values. I'm trying to insert 1000000 rows.

Which alternatives I've?

Many thanks!

John_Rodgers
  • 181
  • 1
  • 11
  • I believe replacing your select inside the `Not Exists` clause with `SELECT 'x' From...` it performs a bit better, since it dumps out once it finds the first match. – Obsidian Phoenix Jul 04 '17 at 15:27
  • 1
    @John_Rodgers - Do you really want to use LIKE for comparisons? Are you looking for exact matches or similar matches? Especially for things like "class_id". – Doug Knudsen Jul 04 '17 at 15:49
  • The fields [student_name],[class_id],[contact_detail],[birth_date] make the Business Key of my table. That's why I put all that comparasions – John_Rodgers Jul 04 '17 at 16:03
  • @John_Rodgers - That makes sense, I was just questioning the use of LIKE vs equals for optimization purposes. In other words, only using LIKE when really necessary. – Doug Knudsen Jul 04 '17 at 16:37

1 Answers1

0

Assuming you really want to match on exact duplicate matches, try something like this:

INSERT INTO [Table A]  ([student_name], [os_name], [class_id],[contact_detail], [birth_date],[note_average])
    SELECT ods.[student_name] = case when CHARINDEX('.', ods.[student_name])>0 then LEFT(ods.[student_name],CHARINDEX('.', ods.[student_name])-1)
            else ods.[student_name] end
        ,ods.[class_id]
        ,case when reverse(SUBSTRING(REVERSE(ods.[contact_detail]),1,CHARINDEX(':', REVERSE(ods.[contact_detail])))) like ':' 
            then (ods.[contact_detail] + '|')
            else    ods.[contact_detail]
            end as [contact_detail]
         ,ods.[birth_date]
         ,CAST(ods.[note_average] AS decimal(13,2)) as ods.[note_average]
         ,GETDATE()
    FROM [Table A] ods
    OUTER JOIN [Table A] a ON ods.[student_name] = a.[student_name]
        AND ods.[os_name] = a.[os_name]
        AND ods.[class_id] = a.[class_id]
        AND ods.[contact_detail] = a.[contact_detail]
        AND ods.[birth_date] = a.[birth_date]
    WHERE CAST(ods.[birth_date] AS DATE) <= CAST(GETDATE() AS DATE)
        AND LEN(ods.[student_name]) >= 5
        -- Only include when no matching duplicate is found.
        AND a.[student_name] IS NULL

You could stick with a sub-query and NOT EXISTS if you want and that would be fine as well but use SELECT 0 or similar instead of selecting all those columns, which isn't necessary. Changing the comparisons from LIKE to = will, I believe, give you the results you want along with making the query optimizable with indexes, if necessary.

AND NOT EXISTS
(
  SELECT 0
  FROM [Table A]  a
  WHERE  ods.[student_name] = a.[student_name]
            AND ods.[os_name] = a.[os_name]
            AND ods.[class_id] = a.[class_id]
            AND ods.[contact_detail] = a.[contact_detail]
            AND ods.[birth_date] = a.[birth_date]
        )

Once you have this query working correctly, then you can look at adding indexes if you need better performance. You might be able to just add an index on [birth_date] or [student_name] to get acceptable query performance.

Doug Knudsen
  • 935
  • 9
  • 15
  • `EXISTS` clause ignores columns list in the `SELECT`. So `SELECT 0` is pointless. If you want something special you can write `EXISTS( SELECT 1 / 0` - it will still work. – Alex Jul 05 '17 at 02:53
  • @Alex - Couldn't agree with you more. It doesn't matter, which was the point I was trying to make. Although maybe my example was bad (but is what I typically use). – Doug Knudsen Jul 05 '17 at 11:16