0

When I execute my "select union select", I get the correct number or rows (156) Executed independently, select #1 returns 65 rows and select #2 returns 138 rows.

When I use this "select union select" with an Insert into, I get 203 rows (65+138) with duplicates.

I would like to know if it is my code structure that is causing this issue ?

INSERT INTO dpapm_MediaObjectValidation (mediaobject_id, username, checked_date, expiration_date, notified)

    (SELECT FKMediaObjectId, CreatedBy,@checkdate,dateadd(ww,2,@checkdate),0
    FROM dbo.gs_MediaObjectMetadata
    LEFT JOIN gs_MediaObject mo
    ON gs_MediaObjectMetadata.FKMediaObjectId = mo.MediaObjectId
    WHERE UPPER([Description]) IN ('CAPTION','TITLE','AUTHOR','DATE PHOTO TAKEN','KEYWORDS')
    AND FKMediaObjectId >= 
        (SELECT TOP 1 MediaObjectId 
            FROM dbo.gs_MediaObject 
            WHERE DateAdded > @lastcheck 
            ORDER BY MediaObjectId) 
    GROUP BY FKMediaObjectId, CreatedBy
    HAVING count(*) < 5

    UNION

    SELECT FKMediaObjectId, CreatedBy,getdate(),dateadd(ww,2,getdate()),0
    FROM gs_MediaObjectMetadata yt
    LEFT JOIN gs_MediaObject mo
    ON yt.FKMediaObjectId = mo.MediaObjectId
    WHERE UPPER([Description]) = 'KEYWORDS'
    AND FKMediaObjectId >= 
        (SELECT TOP 1 MediaObjectId 
            FROM dbo.gs_MediaObject 
            WHERE DateAdded > @lastcheck
            ORDER BY MediaObjectId) 
    AND NOT EXISTS
            (
            SELECT  *
            FROM    dbo.fnSplit(Replace(yt.Value, '''', ''''''), ',') split
            WHERE   split.item in (SELECT KeywordEn FROM gs_Keywords) or split.item in (SELECT KeywordFr FROM gs_Keywords)
            )
    )

I would appreciate any clues into resolving this problem ...

Thank you !

crichard
  • 39
  • 1
  • 6
  • 3
    A `SELECT` statement by itself should return the same results when used in `INSERT INTO... SELECT`. Are you sure that query returns different results when executed by itself versus being used to insert into your table? That doesn't seem possible to me... – Michael Fredrickson Mar 26 '12 at 18:20
  • Yes, I'm sure the select & union is ok. That's why I'm wondering if it is some sort of "order of operations" problem. But I just did some more tests, and if I call my sproc from SQLServer directly, the result is not ok, but when I run the code in a query, it works fine. – crichard Mar 26 '12 at 18:25

1 Answers1

0

The UNION keyword should only return distinct records between the two queries. However, if I recall correctly, this is only true if the datatypes are the same. The date variables might be throwing that off. Depending on the collation type, whitespace might be handled differently as well. You might want to do a SELECT DISTINCT on the dpapm_MediaObjectValidation table after doing your insert, but be sure to trim whitespace from both sides in your comparison. Another approach is to do your first insert, then on your second insert, forgo the UNION altogether and do a manual EXISTS check to see if the items to be inserted already exist.

Nolan St. Martin
  • 407
  • 1
  • 3
  • 16
  • I modified my sproc to use your second solution (Exists with 2 inserts). That seems to do the trick. Do you know if this is a big performance hit? – crichard Mar 26 '12 at 18:42
  • Usually it doesn't if the dataset you're querying isn't gigantic. I personally don't like using tricks like this, but sometimes you have no choice due to things not behaving as expected. However, according to the following stackoverflow post, using the EXISTS keyword itself should help reduce some of the load --> http://stackoverflow.com/questions/2065329/sql-server-in-vs-exists-performance – Nolan St. Martin Mar 26 '12 at 20:16