399

I want to insert data into my table, but insert only data that doesn't already exist in my database.

Here is my code:

ALTER PROCEDURE [dbo].[EmailsRecebidosInsert]
  (@_DE nvarchar(50),
   @_ASSUNTO nvarchar(50),
   @_DATA nvarchar(30) )
AS
BEGIN
   INSERT INTO EmailsRecebidos (De, Assunto, Data)
   VALUES (@_DE, @_ASSUNTO, @_DATA)
   WHERE NOT EXISTS ( SELECT * FROM EmailsRecebidos 
                   WHERE De = @_DE
                   AND Assunto = @_ASSUNTO
                   AND Data = @_DATA);
END

And the error is:

Msg 156, Level 15, State 1, Procedure EmailsRecebidosInsert, Line 11
Incorrect syntax near the keyword 'WHERE'.

mohsen mashhadi
  • 201
  • 2
  • 8
Francisco Carvalho
  • 4,123
  • 3
  • 16
  • 12
  • 15
    You should not rely on this check alone to ensure no duplicates, it is not thread safe and you will get duplicates when a race condition is met. If you really need unique data add a unique constraint to the table, and then catch the unique constraint violation error. [See this answer](http://stackoverflow.com/a/3408196/1048425) – GarethD Jan 07 '14 at 12:54
  • 1
    You can use MERGE query or If not exist( select statement ) begin insert values END – Abdul Hannan Ijaz Jan 20 '16 at 06:50
  • It depends on the scenario if you should relay or not on this check. If you are developing a deploy script that writes data to a "static" table for example, this is not an issue. – AxelWass Nov 09 '16 at 16:48
  • you can use "if not exists (select * from..." like this http://stackoverflow.com/a/43763687/2736742 – A. Morel May 03 '17 at 15:11
  • 2
    @GarethD: what do you mean "not thread safe"? It may not be elegant but it looks correct to me. A single `insert` statement is always a single transaction. It's not as if the SQL Server evaluates the subquery first and then at some later point, and without holding a lock, goes on to do the insert. – Ed Avis Aug 17 '17 at 11:40
  • 1
    @EdAvis That is exactly what happens, unless you explicitly use a transaction and the `UPDLOCK` and `HOLDLOCK` query hints, the lock on `EmailsRecebidos` will be released as soon as the check is done, momentarily before the write to the same table. In this split second, another thread can still read the table and assume records don't exist and encounter the race condition. By using the explicit transactions and the locking hints, and can stop the lock on the table being released after the select statement is finished. The lock will be held until the transaction is committed. – GarethD Aug 17 '17 at 12:11
  • This is discussed further here [Conditional INSERT/UPDATE Race Condition](http://weblogs.sqlteam.com/dang/archive/2007/10/28/Conditional-INSERTUPDATE-Race-Condition.aspx). This is besides the point though, if data should be unique, then it should be constrained to be unique. Without a constraint there would be nothing to stop someone just bypassing the stored procedure completely, and inserting duplicate records with a simple insert script. – GarethD Aug 17 '17 at 12:16
  • EXCEPT is a good way, as well as MERGE and LEFT JOIN – Mark Homer Aug 04 '22 at 17:56

13 Answers13

491

instead of below Code

BEGIN
   INSERT INTO EmailsRecebidos (De, Assunto, Data)
   VALUES (@_DE, @_ASSUNTO, @_DATA)
   WHERE NOT EXISTS ( SELECT * FROM EmailsRecebidos 
                   WHERE De = @_DE
                   AND Assunto = @_ASSUNTO
                   AND Data = @_DATA);
END

replace with

BEGIN
   IF NOT EXISTS (SELECT * FROM EmailsRecebidos 
                   WHERE De = @_DE
                   AND Assunto = @_ASSUNTO
                   AND Data = @_DATA)
   BEGIN
       INSERT INTO EmailsRecebidos (De, Assunto, Data)
       VALUES (@_DE, @_ASSUNTO, @_DATA)
   END
END

Updated : (thanks to @Marc Durdin for pointing)

Note that under high load, this will still sometimes fail, because a second connection can pass the IF NOT EXISTS test before the first connection executes the INSERT, i.e. a race condition. See stackoverflow.com/a/3791506/1836776 for a good answer on why even wrapping in a transaction doesn't solve this.

Community
  • 1
  • 1
Imran Ali Khan
  • 8,469
  • 16
  • 52
  • 77
  • 41
    Note that under high load, this will still sometimes fail, because a second connection can pass the IF NOT EXISTS test before the first connection executes the INSERT, i.e. a race condition. See See http://stackoverflow.com/a/3791506/1836776 for a good answer on why even wrapping in a transaction doesn't solve this. – Marc Durdin Sep 25 '14 at 06:52
  • 11
    SELECT 1 FROM EmailsRecebidos WHERE De = @_DE AND Assunto = @_ASSUNTO AND Data = @_DATA To use 1 instead of * would be more efficient – Reno Feb 10 '15 at 15:13
  • `select *` is bad parformance, better select the ID column. Or - one of the cols in the `where` clause, cause they're being read anyway – jazzcat Jan 26 '16 at 10:04
  • 1
    Put a write lock around the whole thing and then you won't have any chance of duplicates. – Kevin Finkenbinder Mar 31 '16 at 09:53
  • 16
    @jazzcat `select *` in this case makes no difference whatsoever because it's being used in an `EXISTS` clause. SQL Server will always optimize it and has been doing it for ages. Since I'm very old I usually write these queries as `EXISTS (SELECT 1 FROM...)` but it is not needed anymore. – Loudenvier Oct 07 '16 at 01:59
  • @jazzcat no, `select *` in an exists or Not exists is perfectly fine. This is exception to what is (otherwise) a good rule. – Charles Bretana Dec 16 '16 at 14:38
  • In the IF statement: there's no need to use BEGIN & END if the number of required command lines is just one even if you used more than one line, so you can omit it here. – Wessam El Mahdy Mar 06 '17 at 21:52
  • 38
    Why does this kind of simple question generate more doubt than certainty? – drowa Jun 28 '17 at 18:26
  • If anybody's looking for an SQLite solution (it doesn't really like the `IF`s), use `INSERT OR IGNORE`: https://stackoverflow.com/a/19343100/616460 – Jason C Jan 22 '22 at 18:15
143

For those looking for the fastest way, I recently came across these benchmarks where apparently using "INSERT SELECT... EXCEPT SELECT..." turned out to be the fastest for 50 million records or more.

Here's some sample code from the article (the 3rd block of code was the fastest):

INSERT INTO #table1 (Id, guidd, TimeAdded, ExtraData)
SELECT Id, guidd, TimeAdded, ExtraData
FROM #table2
WHERE NOT EXISTS (Select Id, guidd From #table1 WHERE #table1.id = #table2.id)
-----------------------------------
MERGE #table1 as [Target]
USING  (select Id, guidd, TimeAdded, ExtraData from #table2) as [Source]
(id, guidd, TimeAdded, ExtraData)
    on [Target].id =[Source].id
WHEN NOT MATCHED THEN
    INSERT (id, guidd, TimeAdded, ExtraData)
    VALUES ([Source].id, [Source].guidd, [Source].TimeAdded, [Source].ExtraData);
------------------------------
INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData)
SELECT id, guidd, TimeAdded, ExtraData from #table2
EXCEPT
SELECT id, guidd, TimeAdded, ExtraData from #table1
------------------------------
INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData)
SELECT #table2.id, #table2.guidd, #table2.TimeAdded, #table2.ExtraData
FROM #table2
LEFT JOIN #table1 on #table1.id = #table2.id
WHERE #table1.id is null
  • 11
    I like EXCEPT SELECT – Bryan Aug 22 '16 at 16:15
  • But EXCEPT may not be efficient for bulk operations. – Aasish Kr. Sharma Feb 23 '18 at 03:31
  • 1
    @Biswa: Not according to those benchmarks. The code is available from the site. Feel free to run it on your system to see how the results compare. –  Dec 18 '18 at 08:04
  • 1
    Beware,those benchmarks are testing "inserting from another table" not explicit values as requested in the question – Alex from Jitbit Jan 17 '20 at 17:42
  • 1
    Been playing with these; looks like the downside to the EXCEPT form is that it'll only skip the insert if *every* field in the existing record matches the record being inserted. Which might be fine. But for me I just wanted to skip if the id existed regardless of the other values, so I had to go for the first form. Super cool though never used EXCEPT before. – Jason C Jan 22 '22 at 18:05
  • PS And for an SQLite solution that just keys on the ID, you can use `INSERT OR IGNORE`: https://stackoverflow.com/a/19343100/616460 – Jason C Jan 22 '22 at 18:17
  • I don't understand the many votes on this one. None of the solutions presented in the benchmark in this answer are actually safe to use under concurrent use. I would think the OP wanted a solution that is safe to use as the very first priority (never fail under load), whereas performance would be something you would look at _only_ when when you had solved that first requirement ? – peterh Aug 12 '22 at 06:06
52

Different SQL, same principle. Only insert if the clause in where not exists fails

INSERT INTO FX_USDJPY
            (PriceDate, 
            PriceOpen, 
            PriceLow, 
            PriceHigh, 
            PriceClose, 
            TradingVolume, 
            TimeFrame)
    SELECT '2014-12-26 22:00',
           120.369000000000,
           118.864000000000,
           120.742000000000,
           120.494000000000,
           86513,
           'W'
    WHERE NOT EXISTS
        (SELECT 1
         FROM FX_USDJPY
         WHERE PriceDate = '2014-12-26 22:00'
           AND TimeFrame = 'W')
Malcolm Swaine
  • 1,929
  • 24
  • 14
33

I would use a merge:

create PROCEDURE [dbo].[EmailsRecebidosInsert]
  (@_DE nvarchar(50),
   @_ASSUNTO nvarchar(50),
   @_DATA nvarchar(30) )
AS
BEGIN
   with data as (select @_DE as de, @_ASSUNTO as assunto, @_DATA as data)
   merge EmailsRecebidos t
   using data s
      on s.de = t.de
     and s.assunte = t.assunto
     and s.data = t.data
    when not matched by target
    then insert (de, assunto, data) values (s.de, s.assunto, s.data);
END
Dale K
  • 25,246
  • 15
  • 42
  • 71
Brett Schneider
  • 3,993
  • 2
  • 16
  • 33
27

Try below code

ALTER PROCEDURE [dbo].[EmailsRecebidosInsert]
  (@_DE nvarchar(50),
   @_ASSUNTO nvarchar(50),
   @_DATA nvarchar(30) )
AS
BEGIN
   INSERT INTO EmailsRecebidos (De, Assunto, Data)
   select @_DE, @_ASSUNTO, @_DATA
   EXCEPT
   SELECT De, Assunto, Data from EmailsRecebidos
END
Turnip
  • 35,836
  • 15
  • 89
  • 111
SaravanaC
  • 396
  • 2
  • 4
22

I did the same thing with SQL Server 2012 and it worked

Insert into #table1 With (ROWLOCK) (Id, studentId, name)
SELECT '18769', '2', 'Alex'
WHERE not exists (select * from #table1 where Id = '18769' and studentId = '2')
Dale K
  • 25,246
  • 15
  • 42
  • 71
14

The INSERT command doesn't have a WHERE clause - you'll have to write it like this:

ALTER PROCEDURE [dbo].[EmailsRecebidosInsert]
  (@_DE nvarchar(50),
   @_ASSUNTO nvarchar(50),
   @_DATA nvarchar(30) )
AS
BEGIN
   IF NOT EXISTS (SELECT * FROM EmailsRecebidos 
                   WHERE De = @_DE
                   AND Assunto = @_ASSUNTO
                   AND Data = @_DATA)
   BEGIN
       INSERT INTO EmailsRecebidos (De, Assunto, Data)
       VALUES (@_DE, @_ASSUNTO, @_DATA)
   END
END
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 3
    You need to handle errors for this procedure because there will be cases where an insert will happen between the check and insert. – Filip De Vos Jan 07 '14 at 12:53
  • @FilipDeVos: true - a possibility, maybe not very likely, but still a possibility. Good point. – marc_s Jan 07 '14 at 13:00
  • What if you wrap both within a transaction? Would that block the possibility? (I'm no expert on transactions, so please forgive if this is a stupid question.) – David Sep 04 '14 at 21:20
  • 1
    See http://stackoverflow.com/a/3791506/1836776 for a good answer on why a transaction doesn't solve this, @David. – Marc Durdin Sep 25 '14 at 06:51
  • 2
    In the IF statement: there's no need to use BEGIN & END if the number of required command lines is just one even if you used more than one line, so you can omit it here. – Wessam El Mahdy Mar 06 '17 at 21:53
11

If your clustered index consists of only those fields then the simple, fast and reliable option is to use IGNORE_DUP_KEY

If you create the Clustered index with IGNORE_DUP_KEY ON

Then you can just use:

INSERT INTO EmailsRecebidos (De, Assunto, Data) VALUES (@_DE, @_ASSUNTO, @_DATA)

This should be safe in all cases!

codejockie
  • 9,020
  • 4
  • 40
  • 46
Alexander Bartosh
  • 8,287
  • 1
  • 21
  • 22
10

Just change your code to use SELECT instead of VALUES

   INSERT INTO EmailsRecebidos (De, Assunto, Data)
   SELECT @_DE, @_ASSUNTO, @_DATA
   WHERE NOT EXISTS (SELECT * FROM EmailsRecebidos 
                   WHERE De = @_DE
                   AND Assunto = @_ASSUNTO
                   AND Data = @_DATA);
Leon Tayson
  • 4,741
  • 7
  • 37
  • 36
  • 2
    only correct answer here, incredible – gordy Oct 20 '22 at 04:10
  • To me, it looks like two, separate statements: one inserting regardless of the current state of the DB and another one selecting 3 columns based on some sub-query. Naturally, I assume I'm mistaken, which leads to the question what I'm missing. Please advise. – Konrad Viltersten Jul 02 '23 at 07:47
  • @KonradViltersten it is a single statement. `INSERT INTO` can’t function without any values (try it out, it’s a syntax error), so whatever comes next will be interpreted as the values to be inserted. This can be a `VALUES` clause with static values or, as in this case, the result set of a fully featured `SELECT`. – BenderBoy Jul 07 '23 at 02:11
  • 1
    @BenderBoy Woosh! I just saw it. I knew I was missing something! Can you imagine I've been doing backend for 17 years or so and never got there?! I always use `values(...)` and put in variables declared based on previous queries. This was an epiphany. Darn... Thanks, mate! – Konrad Viltersten Jul 26 '23 at 13:37
9

Depending on your version (2012?) of SQL Server aside from the IF EXISTS you can also use MERGE like so:

ALTER PROCEDURE [dbo].[EmailsRecebidosInsert]
    ( @_DE nvarchar(50)
    , @_ASSUNTO nvarchar(50)
    , @_DATA nvarchar(30))
AS BEGIN
    MERGE [dbo].[EmailsRecebidos] [Target]
    USING (VALUES (@_DE, @_ASSUNTO, @_DATA)) [Source]([De], [Assunto], [Data])
         ON [Target].[De] = [Source].[De] AND [Target].[Assunto] = [Source].[Assunto] AND [Target].[Data] = [Source].[Data]
     WHEN NOT MATCHED THEN
        INSERT ([De], [Assunto], [Data])
        VALUES ([Source].[De], [Source].[Assunto], [Source].[Data]);
END
Don
  • 9,511
  • 4
  • 26
  • 25
-1

You could use the GO command. That will restart the execution of SQL statements after an error. In my case I have a few 1000 INSERT statements, where a handful of those records already exist in the database, I just don't know which ones. I found that after processing a few 100, execution just stops with an error message that it can't INSERT as the record already exists. Quite annoying, but putting a GO solved this. It may not be the fastest solution, but speed was not my problem.

GO
INSERT INTO mytable (C1,C2,C3) VALUES(1,2,3)
GO
INSERT INTO mytable (C1,C2,C3) VALUES(4,5,6)
 etc ...
Dale K
  • 25,246
  • 15
  • 42
  • 71
mljm
  • 327
  • 3
  • 13
-1

If you want to check whether a key exists or not, you can use:

INSERT INTO tableName (...) VALUES (...) 
ON DUPLICATE KEY 
UPDATE ...

Using this, if there is already an entry for the particular key, then it will UPDATE, else, it will INSERT.

Jay
  • 67
  • 1
  • 3
-4

As explained in below code: Execute below queries and verify yourself.

CREATE TABLE `table_name` (
  `id` int(11) NOT NULL auto_increment,
  `name` varchar(255) NOT NULL,
  `address` varchar(255) NOT NULL,
  `tele` varchar(255) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB;

Insert a record:

INSERT INTO table_name (name, address, tele)
SELECT * FROM (SELECT 'Nazir', 'Kolkata', '033') AS tmp
WHERE NOT EXISTS (
    SELECT name FROM table_name WHERE name = 'Nazir'
) LIMIT 1;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0

SELECT * FROM `table_name`;

+----+--------+-----------+------+
| id | name   | address   | tele |
+----+--------+-----------+------+
|  1 | Nazir  | Kolkata   | 033  |
+----+--------+-----------+------+

Now, try to insert the same record again:

INSERT INTO table_name (name, address, tele)
SELECT * FROM (SELECT 'Nazir', 'Kolkata', '033') AS tmp
WHERE NOT EXISTS (
    SELECT name FROM table_name WHERE name = 'Nazir'
) LIMIT 1;

Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

+----+--------+-----------+------+
| id | name   | address   | tele |
+----+--------+-----------+------+
|  1 | Nazir  | Kolkata   | 033  |
+----+--------+-----------+------+

Insert a different record:

INSERT INTO table_name (name, address, tele)
SELECT * FROM (SELECT 'Santosh', 'Kestopur', '044') AS tmp
WHERE NOT EXISTS (
    SELECT name FROM table_name WHERE name = 'Santosh'
) LIMIT 1;

Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0

SELECT * FROM `table_name`;

+----+--------+-----------+------+
| id | name   | address   | tele |
+----+--------+-----------+------+
|  1 | Nazir  | Kolkata   | 033  |
|  2 | Santosh| Kestopur  | 044  |
+----+--------+-----------+------+
Dale K
  • 25,246
  • 15
  • 42
  • 71
Vadiraj S J
  • 639
  • 10
  • 17