SQL Read Where IN (Long List from .TXT file)

Question

I have a long list about 5000+ of ID's (numbers).

I would like to know if there a way to call to read the ID's from the txt file instead of typing all 5000 in the query?

example

SELECT count(*) from table where ID in (file1.txt)

It'd be best if you load these in a table temporarily and then do an inner join with your existing table. As long as you have the right indexes, I expect performance to be faster instead of doing `ID in (thousands of values)` — zedfoxus, Sep 27 '17 at 19:53
not sure about which version of SQL you are using, but in Firebird 2.5, I find the query fails if I have more than 1500 values in the "in" clause. — Eric, Sep 27 '17 at 19:54
There are few ways you can import a file into a table/temp table: https://learn.microsoft.com/en-us/sql/relational-databases/blob/load-files-into-filetables https://www.codeproject.com/Articles/290242/Import-Data-from-a-Text-or-CSV-file-into-SQL-Serve — NonProgrammer, Sep 27 '17 at 19:58
@zedfoxus like your idea. Can you please explain bit more on the indexes? — user206168, Sep 27 '17 at 19:58
What DBMS product is this? The different products are *extremely* different when it comes to things like reading or writing external files. And for most of them it's usually easier to just copy and paste the list into a table from the GUI, if it's a one-time need. — RBarryYoung, Sep 27 '17 at 19:59
With 5000+ records dropping into a temp table and joining against it makes sense. If you frequently have smaller lists of id's you're working with, it might be worth the time and effort to hammer out a simple C#/Java/VB.NET/C++ program to read them in from a text file and delimit the output. — user2366842, Sep 27 '17 at 20:20
@user206168 I've added an answer. Sorry I couldn't get back to you sooner. Feel free to comment on the answer. I think you asked a really good question. — zedfoxus, Sep 28 '17 at 13:34

score 4 · Answer 1 · edited Sep 28 '17 at 13:35

4

Step 1: Copy all your values in sublime or notepad++ Step 2: Press ctrl+h Choose the "Regular expressions" option Step 3: To add "," to the end of each line,

type $ in the "Find what" field, and "," in the "Replace with" field. Then hit "Replace All".

Then simply copy paste the values in your SQL query

SELECT COUNT(*) FROM `admins` WHERE id in (4,
5,
6,
9,
10,
14,
62,
63,
655,
656,
657,
658,
659,
661,
662)

PS: Do remove comma from the last value.

edited Sep 28 '17 at 13:35

zedfoxus

35,121
5
64
63

answered Sep 27 '17 at 20:07

Sagar Ahuja

726
6
23

2

While this will work (at least in most DBs), having an `in` condition with the 5000+ items that the OP mentioned will result in an extremely slow execution. – user2366842 Sep 27 '17 at 20:26

zedfoxus · Accepted Answer · 2018-11-15T02:57:00.543

You have a few options, of which one option is my recommended one.

Option 1

Create a table in your database like so:

create table ID_Comparer (
    ID int primary key
);

With a programming language of your choice, empty out the table and then load the 5000+ IDs that you want to eventually query in this table.

Then, write one of these queries to extract the data you want:

select *
from main_table m
where exists (
    select 1 from ID_Comparer where ID = m.ID
)

or

select *
from main_table m
inner join ID_Comparer c on m.ID = c.ID

Since ID_Comparer and (assuming that) main_table's ID is indexed/keyed, matching should be relatively fast.

Option 1 modified

This option is just like the one above but helps a bit with concurrency. That means, if application 1 is wanting to compare 2000 IDs whereas application 2 is wanting to compare 5000 IDs with your main table at the same time, you'd not want to delete data from comparer table. So, change the table a bit.

create table ID_Comparer (
    ID int primary key,
    token char(32), -- index this
    entered date default current_date() -- use the syntax of your DB
);

Then, use your favorite programming language to create a GUID. Load all the ID and the same GUID into the table like so:

1, 7089e5eced2f408eac8b390d2e891df5
2, 7089e5eced2f408eac8b390d2e891df5
...

Another process doing the same thing will be loading its own IDs with a GUID

2412, 96d9d6aa6b8d49ada44af5a99e6edf56
9434, 96d9d6aa6b8d49ada44af5a99e6edf56
...

Now, your select:

select *
from main_table m
where exists (
    select 1 from ID_Comparer where ID = m.ID and token = '<your guid>'
)

OR

select *
from main_table m
inner join ID_Comparer c on m.ID = c.ID and token = '<your guid>'

After you receive your data, be sure to do delete from ID_Comparer where token = '<your guid>' - that'd just be nice cleanup

You could create a nightly task to remove all data that's more than 2 days old or some such for additional housekeeping.

Since ID_Comparer and (assuming that) main_table's ID is indexed/keyed, matching should be relatively fast even when the GUID is an additional keyed lookup.

Option 2

Instead of creating a table, you could create a large SQL query like so:

select * from main_table where id = <first id>
union select * from main_table where id = <second id>
union select * from main_table where id = <third id>
...

OR

select * from main_table where id IN (<first 5 ids>)
union select * from main_table where id IN (<next 5 ids>)
union select * from main_table where id IN (<next 5 ids>)
...

If the performance is acceptable and if creating a new table like in option 1 doesn't feel right to you, you could try one of these methods.

(assuming that) main_table's ID is indexed/keyed, individual matching might result in faster query rather than matching with a long list of comma separated values. That's a speculation. You'll have to see the query plan and run it against a test case.

Which option to choose?

Testing these options should be fast. I'd recommend trying all these options with your database engine and the size of your table and see which one suits your use-case the most.

score 0 · Answer 3 · edited Jun 20 '20 at 09:12

EASY ANSWER (PostgreSQL, Ubuntu 16.04):

Say you have a table userdogs with many many users and their dog's names: userdogs:

id    user     dog

Also you have a file friendsfile.txt where there are names of your friends. And you'd like to only select your friends from the table userdogs.

friendsfile.txt:

Emily
John
Bill
Charlie
Cameron

1. Create a new table and insert stuff from file inside:

CREATE TABLE friends (name varchar(200));

COPY friends
FROM '/home/friendsfile.txt' 
WITH DELIMITER '~';

If file looks something like this:

Emily/John/Bill/Charlie/Cameron

something like this should work (not tested, my case was with new lines):

COPY friends
FROM '/home/friendsfile.txt' 
WITH DELIMITER('|');

2.

Then you go and select 'em:

SELECT DISTINCT  user, dog FROM userdogs 
WHERE (SELECT COUNT(*) 
FROM friends 
WHERE friends.name=userdogs.user)>0;