2

Can anybody help me work out how to approach a common problem that I come across every now and then but tend to work around the long way? I'm trying to find an elegant way to identifying what Id's don't exist in a table given a large list of values.

I've been provided with an Excel sheet with a list of 4000+ string id's and I need to identify which of them don't exist in SQL Server table.

My first action was to use Notepad++ and convert the list into a CSV list similar to 'XX1','XX2','XX3' etc and the running a SELECT * FROM [TABLE] WHERE [ID] IN ('XX1','XX2','XX3',...). This shows up 2 fewer records than shown in Excel so now I need to sift through them to identify which two aren't in the database.

Thanks for any help.

Paul.

Paul
  • 69
  • 2
  • 10
  • 2
    Normally you would have the IDs to check in one table. Then you can LEFT JOIN against the actual table with records and see what is missing – juergen d Jun 08 '16 at 11:10
  • Thanks @juergend - thi is a case where I don't have the Id's in a table - thy're being presented to me this time in Excel (sometimes via emails or other means but essentially outside the db structure otherwise it would be simpler) – Paul Jun 08 '16 at 11:17
  • Thanks @LukStorms any tips for getting the data simply into a temp table without scripting it too much? Once it's in there I'm happy enough with TSQL to do what I need but I'm wondering if there is a simple way to get the list into the db easily to be then worked with. – Paul Jun 08 '16 at 11:19
  • @Paul I add an extra answer although Lasse's answer is quite good. – LukStorms Jun 08 '16 at 11:42

3 Answers3

3

You can join against an inline table generated with the table value constructor, provided it doesn't overflow with that many id's, in which case you need to dump them into a table.

The limit for number of rows in one VALUES expression is 1000, according to the documentation:

The maximum number of rows that can be constructed by inserting rows directly in the VALUES list is 1000

To join with an inline table, use the syntax:

INNER JOIN (VALUES (1), (2), ...) AS inline (id)

Here's an example, with this table (your actual table):

CREATE TABLE databasetable (ID INT NOT NULL)
INSERT INTO databasetable VALUES (1), (2), (3), (5), (6), (8), (9), (10)

you can query for the missing values like this:

SELECT
    excel.id
FROM
    databasetable AS A
    RIGHT JOIN (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) AS excel (id)
        ON A.id = excel.id
WHERE
    A.id IS NULL

You would thus have to generate the VALUES (1), (2), (3), ...) from your Excel spreadsheet.

To use a temporary table, if you have more than 1000 rows:

DECLARE @EXCEL TABLE (ID INT)
INSERT INTO @EXCEL VALUES (1), (2), (3), ..... -- first 1000
INSERT INTO @EXCEL VALUES (1), (2), (3), ..... -- next 1000
INSERT INTO ...                                -- and so on

SELECT
    @excel.id
FROM
    databasetable AS A
    RIGHT JOIN @excel
        ON A.id = @excel.id
WHERE
    A.id IS NULL
Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • Perfect Thanks! I'd used the table value constructor before but not in this situation and with a little manipulation of the input list formatting this has worked perfectly for just over 4000 values and I've now found my missing Id! Many thanks! – Paul Jun 08 '16 at 11:28
  • Note my remark about the possible limit. It may be that you cannot create an inline table with 4000 rows, you will have to see what works. – Lasse V. Karlsen Jun 08 '16 at 11:30
  • You may want to un-accept my answer after my edit, hang on a sec. – Lasse V. Karlsen Jun 08 '16 at 11:30
  • When using your example with the VALUES table inline I achived what I needed with 4012 values listed. This was on SQL 2008 R2 SP1 – Paul Jun 08 '16 at 11:33
1

Just a few demo's to add to the possibilies

Via a table variable.
This time with a NOT IN clause:

DECLARE @varTbl TABLE(ID varchar(8));

insert into @varTbl values ('XX1'),('XX2');

select id from @varTbl 
where id not in (select distinct id from [SomeTable]);

Via a temporary table that only exists during your session.
This time with a LEFT JOIN.

CREATE TABLE #tempTbl (ID varchar(8));

insert into #tempTbl values ('XX1'),('XX2');

select tmp.id 
from #tempTbl tmp
left join [SomeTable] t on (tmp.id = t.id)
where t.id is null
group by tmp.id
order by tmp.id;

Another completely diffent method is to compare via files. You first export the ID's of your reference data and the ID's from the destination table to sorted text files. Then you filter out the differences of lines that exist in the reference data but not in the destination table.
On Windows it's something you could do via PowerShell.

$ref = Get-Content WhatShouldBe.txt
$dest = Get-Content WhatIsNow.txt

Compare-Object $ref $dest |Where-Object {$_.SideIndicator -eq "<="} |Select -ExpandProperty InputObject > WhatCanBeAdded.txt
LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • Hi Luk, The examples would show the records that exist in the database but don't exist in the IN expression which is the opposite of what I was looking for but thanks for trying. – Paul Jun 08 '16 at 11:42
  • Heh, I indeed misunderstood that. Fixed the statements. The principle is the same anyway. – LukStorms Jun 08 '16 at 12:28
  • 1
    @Paul I was wondering how to do this easy on Windows via comparing simple text files with the ID's sorted. So I added a solution that uses PowerShell. – LukStorms Jun 08 '16 at 14:20
  • 1
    Whilst the additional option adds too much complexity for me (as I'm not that familiar with PowerShell) it's impressive that you've worked through additional options! I'm hoping that in future I'll remember the table value constructor used by Lasse as it's a quick and simple tip to remember when woking away is SSMS. Cheers – Paul Jun 08 '16 at 15:16
1

Believing you can create tables in database

If you are getting files in excel sheet or in email move the content to an excel sheet if its an email or directly import the excel sheet using Import data wizard to a staging table after that you can compare the data in staging table and your original table easily which much easier than loading the id in table variables or #temp tables. This video shows how to import excel data to sql server https://www.youtube.com/watch?v=Z1vqhYlwcyA

Also do a random check that all the values are populated or is any NULL values are inserted also.

One more thing to add using IN clause can which having thousands of values can throw an error. Please check this page in SO "IN" clause limitation in Sql Server

Community
  • 1
  • 1
Biju jose
  • 263
  • 2
  • 15