SQL IN Query performance - better split it or not

Question

I get up to 1000 id's from another server to display them for visitors so I have to use the IN query like:

SELECT * FROM `table` WHERE `id` IN (23221, 42422, 2342342....) // and so on, up to 1000

Let's say 1/3 of the visitors will watch though all of the 1000 id's while 2/3 of the them will only watch the first 50.

What would be a better for performance/workload, one query for all the 1000 id's or split them into like 20 queries so 50 id's each? So when the first 50 have been watched, query for the next 50 etc.

EDIT:

I don't need to use LIMIT when spliting, which means the id's in the query would be 50 max. So what's better, one query with 1000 id's at once or 20 queries each 50 id's?

EDIT:

Ok I ask it shortly and more directly: are 1000 id's in one query not too much? I have read here How to optimize an SQL query with many thousands of WHERE clauses that tons of WHERE/OR are bad??

If you are not displaying all of them, there is no need to select them (except for total rows count for paging). — Vatev, Oct 02 '18 at 14:10
Yeah but as mentioned 1/3 of users will watch through all of them, and there will be UP TO 1000 id's so maybe half of the time less then 100 — , Oct 02 '18 at 14:12
I come from a mssql background, and not a mysql background, but if I were doing this in mssql, I would write it both ways and examine the performance using the execution plan and the results from running sql server profiler to see what the difference in the two approaches are. While it could be that running more queries will take more time than running the larger query, you may find that it will be quicker for the 2/3 of the cases and also, it may make the application more responsive depending on how the query performs. — dmoore1181, Oct 02 '18 at 14:19
Also, not sure if this is something in mysql or not, but it may be worth writing a stored procedure and passing in a user defined table type parameter. This may gain some performance as you will have a pre determined execution plan (this is at least possible in the mssql world, not sure about mysql). — dmoore1181, Oct 02 '18 at 14:22
Just to answer the edit 1000 ids, isnt much at all.And I rather do 1 query to bring 1000 than handle 20 different queries. — Juan Carlos Oropeza, Oct 02 '18 at 14:28
@dmoore1181 _offtopic_ how would you have a predetermined query plan with a dynamic table parameter ? :P — Vatev, Oct 02 '18 at 14:29
"1000 ids, isnt much at all" I really hope this is the case because as mentioned with the provided link "How to optimize an SQL query with many thousands of WHERE clauses" this guy had extremely problems with many OR's — , Oct 02 '18 at 14:32
@Vatev what do you mean? predetermined query plan? dynamic table parameter? Oo :D — , Oct 02 '18 at 14:36
Personally with an IN statement I try to limit to < 50 as performance generally starts to suffer. I'd probably dump the IDs to a temp table and join; or use where exists. — xQbert, Oct 02 '18 at 14:37
thanks xQbert but the problem is the id's are the visitors of users, one user can have up to 1000 visitors. When I have 100000 users it could be like 10 million rows for visitors table. So I rather create one JSON file for each user with his visitors in it on another server. I'm sure it's a much better way than a 10 million rows table — , Oct 02 '18 at 14:40
@donkinged you might be surprised about the performance of a 10M rows table with 2 int columns which are the PK. Otherwise queries with `IN(...)` are quite fast in my experience. There is a point at which the optimizer will throw in the towel, but that happens with a higher amount of values (there is also a warning when it happens). — Vatev, Oct 02 '18 at 14:54
but the id's are 12 digits ^^ I keep doing it with JSON and one big query each because the further problem with putting all in tables is splitting when I reach like 1 million users :D I can distribute the JSON files on multiple servers. Managing all that with databases only would be harder I guess and I have no idea how to do that, I already looked it up — , Oct 02 '18 at 15:01
bigint can hold 12 digits (and you can get as much as you want with binary blobs..). If it works with the JSON files - thats OK - do the queries with `IN(...1k ints...)`. As I said earlier, there is a limit, but its more than 10K (if i remember correctly) and the performance hit is very severe, so you will notice it :P — Vatev, Oct 02 '18 at 15:16
But I have more to handle than visitors, it's also favorites, messages and more which I do with JSON. Using all that tables with inner join would mean a massive amount of data on one server. I already searched for splitting databases/tables on multiple servers but can't find anything about that. So when I use a database for evetyhing and reach the limit I'm totally screwed while JSON is very flexible. I would appreciate it when you could show me tutorials on splitting tables/databases on multiple servers :D I think I will limit last visitors from 1000 to 300 or something — , Oct 02 '18 at 15:42
@Vatev, may have misworded that, but basically when you have a stored procedure, once it has been run, the query plan is cached and able to be reused the next time, which makes the query perform better by removing the time to compile the new query plan, passing a dynamic number of parameters you are not going to be guaranteed a cached query plan is going to be used. — dmoore1181, Oct 02 '18 at 19:52
Try both approaches. You will probably find that the overhead for issuing a query is the costly part. — Rick James, Oct 04 '18 at 19:47

ARr0w · Accepted Answer · 2018-10-03T06:36:31.867

Let's say 1/3 of the visitors will watch though all of the 1000 id's while 2/3 of the them will only watch the first 50.

Since you want to optimize your response as you assumed how visitors will treat it.

What would be a better for performance/workload, one query for all the 1000 id's or split them into like 20 queries so 50 id's each? So when the first 50 have been watched, query for the next 50 etc.

Yes, you are correct you should limit the return response. This is one example of how you can implement your requirement (I don't know much mysql but this is how you could get desired result).

SELECT * FROM `table` WHERE `id` IN (23221, 42422, 2342342....)
order by `id`
LIMIT 10 OFFSET 10

if it was SQL SERVER:

  create stored proc sp_SomeName
  @id varchar(8000)
  @skip int,
  @take int
  as
  begin
     SELECT * FROM some_table WHERE id IN (23221, 42422, 2342342....)
     order by id
     OFFSET @skip ROWS --if 0 then start selecting from 0 
     FETCH NEXT @take ROWS ONLY --if 10 then this is the max returning limit
  end

what above query will do is : It will get all the data of the posted ids, then it will order by id in ascending order. Then from their it will choose just first 10/50/100, next time, it will choose the next 10/50/100 or whatever your take choice is and skip choice is. Hope this helps man :)

off course they are too much for a grid. (if paging is not applied on it) — ARr0w, Oct 02 '18 at 14:27
if you are handling paging on interface, then its fine. If not, you can handle from SQL like i just told you. — ARr0w, Oct 02 '18 at 14:27
For grid? You mean how to display them?? This is another topic, the question if only and exclusively about server performance/workload — , Oct 02 '18 at 14:29
fetching 1000 ids using syntax `IN` is fine, but if you want it even more fast in performance then above is one of the way you can manage that. jeesh. — ARr0w, Oct 02 '18 at 14:32

score 0 · Answer 2 · answered Oct 02 '18 at 14:02

0

You can look at the answer provided here: MySQL Data - Best way to implement paging?

With the LIMIT statement you can return only a portion of the result. And by changing the parameters in the LIMIT statement, you can re-use the query.

Do know that unless you use an 'ORDER BY', an SQL server does not always return the same records. In other words, should a record by unavailable to read due to an update that occurs, while the database-server can read the next record, it will fetch the next record (to give a result as soon as possible). I do not know for sure if the LIMIT forces a database-server to take some sort of order into consideration (I am not that familiar with MySql).

answered Oct 02 '18 at 14:02

incomudro

548
4
12

No I don't need to order them and I know about pading/limit etc. but what would be a better workload? I don't know how exactly sql handles the queries, caching etc. – Oct 02 '18 at 14:08
I do not know how MySQL does caching. I know with SQL-Server it is best to provide queries with parameters (something an ORM usually does). That way SQL-Server can reuse the Query-Execution-Plan it has generated. (if the parameters are inside the query itself, each query will look different to the cache, and thus will be stored separately). But normally, as performance is often not very linear, I would say to test it. Especially as you try to compare retrieving a lot less data being retrieved, versus less queries being send to the database-server. – incomudro Oct 02 '18 at 14:17

SQL IN Query performance - better split it or not

2 Answers2