3

I need to retrieve records that match IDs stored in a list. The query, generated at runtime, is simple:

SELECT [whatever FROM sometable] WHERE (id = 1) or (id = 5) or (id = 33).

Which is equivalent to

SELECT [whatever FROM sometable] WHERE [id] IN (1, 5, 33);

This is fine, but what if the list contains hundreds or thousands of IDs? The statement will be huge and at some point the SQL parser might croak, or if it does not, performance will probably be quite bad. How can I do this in a way that is not so sensitive to the number of records being retrieved?

(The reason I can't just loop over the list and retrieve records one by one is that I need the database to do ORDER BY for me. Records must come from the DB ordered by a specific field, while the list represents records selected by the user in a grid which can be sorted in any number of ways. And yes, I could sort the records in code after I retrieve them, but that's plan B, since I don't even need to hold them all in one data structure, only to come properly ordered.)

Marek Jedliński
  • 7,088
  • 11
  • 47
  • 57
  • can you put all ids in a temporary table, and say SELECT [whatever FROM sometable] WHERE [id] IN (SELECT id FROM #temptable) ? – Akhil Jan 27 '12 at 23:37
  • I had more thoughts. 1)Good design would keep any list from showing thousands of id's. Humans can't look at more than a screen worth at a time anyways. And chances are the humans are doing some aggregation on this info. Do it for them! 2) I simply cannot be convinced there is a single reason to loop queries. EVER! (I might catch some heat there). Never! SQL is too powerful. This indicates an issue with the model. Take your model to 3rd Normal form. Its a small amount of work, but it pays in dividends in scalability. – Matt Jan 30 '12 at 22:27
  • @Matt: (1) Long story, I was hoping to avoid it :) It's a list of clips in a clipboard manager, but it could be URLs in a bookmark manager, etc. The display may already be filtered (e.g. only items added today), and user typically selects only one item, or a few at a time. But the app does not impose any arbitrary limits. If it happens to be showing 2k items and user hits Ctrl+A and then "Export", say, I need to handle this gracefully, even if it's a rare scenario. (cont.) – Marek Jedliński Jan 31 '12 at 03:21
  • (cont.) And because these are clips, often the sensible thing to do is order them chronologically, even if they are currently sorted alphabetically or by length, etc (user decides). This is why the order of the selection may not be the order in which the records need to come from the db. (2) I do agree about looping. I asked because I was trying to avoid it. – Marek Jedliński Jan 31 '12 at 03:23
  • Since there is a chronological attachment to this, page them by date. I still think its terrible practice to put that many records in a list. Have a look at how google chrome does it. History / Bookmarks. Add an export by date range filter, export by selected in current view, and export all feature and you should be golden for just about any user. Or if they really do need that much granularity instead of providing the big view, provide a filter control system which will help them narrow the results in the list. Or even a "contains" export feature. – Matt Feb 01 '12 at 18:20

5 Answers5

5

If you're really going to have so many IDs that you're worried about the SQL parser croaking, you can store them into a temporary table and do a cross-join.

Simply create the table with one (primary key) column, the ID, then populate it with the desired IDs and use something like:

SELECT [whatever] FROM [sometable] st, [idtable] it
WHERE st.id = it.id

That query won't choke any parser and the rows retrieved will be limited to those having the ID in the temporary table.

This doesn't have to be a temporary table, of course, you can leave it lying around provided you ensure only one "thing" uses it at a time.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
1

Could you add these items to a table and then join to it?

SELECT Whatever FROM TableA CROSS JOIN TableB ON TableA.ID = TableB.ID
Matt
  • 1,441
  • 1
  • 15
  • 29
0

Presumably there is some logic involved in determining the list of ids to retrieve, could this logic not be incorporated into the where clause? Perhaps you could join to a table that contains some kind of session id along with the required ids?

ninesided
  • 23,085
  • 14
  • 83
  • 107
  • That is essentially what he is doing using the IN clause, but he's asking at what point does that list get so big that the IN is no longer efficient, and if that is a problem, what can he do about it... – Matt Jan 27 '12 at 23:42
  • I meant that the selection of the ids is based on some piece of data that would be better used in the where clause rather than the id. For example, if we were selecting all of the ids of users with no hair, you could use `WHERE hair='N'` rather than a list of ids. My example is overly trivial, but you get the drift. – ninesided Jan 27 '12 at 23:53
  • @Ninesided: There's no piece of data in the db. As I explained in the question, user selects (Shift+click or Ctrl+click) multiple records in a grid. I generate a list that holds IDs of the selected records. Now, I could store the "selected" flag in the database, but to do so I must solve the same exact problem that prompted the original question. – Marek Jedliński Jan 28 '12 at 10:21
  • 1
    This may not actually be helpful, however, if you are selecting rows from a multi-select box I doubt your end user would get it to the point of choking the compiler. With 1 exception. An "All" item in your list box. In this case I might consider a second query to handle that situation. Also, if there truly were that many records being selected outside an all condition, you may consider finding some normalized way of grouping them in another table. Perhaps a group id, or some quantifier that could be stored separately. Or look at exclusion rather in inclusion lists. Thought I'd share :) – Matt Jan 30 '12 at 22:04
0

If the values in your where clause are in a table you can do this.

select id from foo where id in (select id from bar)
Alex
  • 2,350
  • 2
  • 20
  • 17
0

Your programming language should support prepared queries, i.e.

 SELECT [whatever FROM sometable] WHERE (id = ?);

or:

 SELECT [whatever FROM sometable] WHERE (id = @id);

By preparing a query, the query is available for repeated used and you can bind the parameters to a variable in your native programming language. If this search is high frequency, its worthwhile to keep these prepared queries for some duration.

There's a useful discussion here:

Community
  • 1
  • 1
Stephen Quan
  • 21,481
  • 4
  • 88
  • 75
  • I know what a prepared query is and I use them where possible. Unfortunately, this does not apply to the question I'm asking :) – Marek Jedliński Jan 28 '12 at 10:18
  • Not helpful for SQLite, but in PostgreSQL the query would be `SELECT whatever FROM sometable WHERE id = SOME $1::integer[]`. But in SQLite, which doesn't have arrays, I think a temporary auxiliary table is your best bet. – Ryan Culpepper Jan 29 '12 at 00:49