0

I have a node JS script which handles large amount of records and compare it to an existing records in database (SQL Server) The NodeJS dynamically construct the query. It fetches data from external source and compare it to my database. Example: I fetched 1000 records from external source and lets say all IDs are now in array called "ar". So I generate a query:

select from my_table where ID in ('--list of all IDs from ar--')

When it's 1k, 2k and up to 10k all works well. But when it starts increasing the queries starts to timeout.

So I apply some kind of paging mechanism I split the array to a fixed size and iterate it in chunks. meta code:

let total_results = 0
while i < ar.length {
let temp_result = execute_query("select from my_table where ID in ('--list of next 10K IDs from ar--')")
i += 5000
total_results += temp_result 
}

*the above is just meta code not real code.

So I do it in chunks and this way it's not failng.

My question:

Is there some other better and faster solution I can use in SQL Server?

Dale K
  • 25,246
  • 15
  • 42
  • 71
Gil Allen
  • 1,169
  • 14
  • 24
  • 1
    Use a table type variable/parameter (as noted in the remarks of the [documentation](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/in-transact-sql?view=sql-server-ver15#remarks)). – Thom A May 24 '21 at 07:24
  • the ID column is indexed by the way – Gil Allen May 24 '21 at 07:24

1 Answers1

0

The current query will parse at runtime the set of ids passed into the in clause. Large the number of ids, more would be the performance impact.

I would try by inserting the 10K ids this in a temp table first.(create an index on the id,if needed) and then query

select *
  from my_table 
 where id in (select id from temp_table)
George Joseph
  • 5,842
  • 10
  • 24
  • Thanks That was my former way of working This means to insert every 10k to a temp query as another step. So if I have 10k in total So I would need to build 10k of insert queries and also run these in chunks because if I going to run it one by one it'll slow down the process And if I'll have 100k running 100k queries will probably timed out to. so I'll do the logic I currently do just to insert - so basically just added step to the process. – Gil Allen May 24 '21 at 07:45
  • If you use a table type *parameter* @GilAllen , then your application can parametrise the variable; you can't do that with a temporary table. – Thom A May 24 '21 at 07:59
  • any way using temp table or table variable will still require the step of pushing the data into the the table\variable which also require similar handling to what I'm currently do finding the right optimized amount\size of query to run so it won't be too long\big – Gil Allen May 24 '21 at 08:09
  • @GilAllen There is already a process to store the array of ids, rather than store it into an array store it into the temp_table or the table_variable. After that you may use it in the query. Pagination of query in sql server can be done using OFFSET and FETCH NEXT. See Example https://www.sqlshack.com/pagination-in-sql-server/#:~:text=What%20is%20Pagination%20in%20SQL,pagination%20solution%20for%20SQL%20Server. – George Joseph May 25 '21 at 01:58