Is it feasible to use this brute-force method for this SQL query?

Question

I have a list of 6700 IDs, stored in an Excel file, that I want to search in the Database. I have a query like this:

select distinct RespondentID,  QuestionID,Condition
from RespondentProfiles (nolock) 
where RespondentID in (10286334,13988335, 18834444,26134155, 8658775)
and QuestionID in (113)

And my IDs are the RespondentIDs above, they look like thus:

Excel says that it can save as comma-separated values. But I wasn't able to do this(i.e I want a text file with all the values, comma-separated) then I'd cut & paste into the query. Which would look a bit ugly no-doubt but I think it's feasible, or is it too slow?

thanks!

I've done this loads of times; it works fine. I'd take the DISTINCT out though; that's likely to slow it down. If those three fields aren't already unique (which I'd guess they are), doing a GROUP BY on them will still be faster. — Braiba, Jul 10 '12 at 15:34
@Braiba, distinct is exactly as fast as group by. It is the same. — usr, Jul 10 '12 at 15:35
Hmm, perhaps this is true usr - http://stackoverflow.com/questions/581521/whats-faster-select-distinct-or-group-by-in-mysql — Caffeinated, Jul 10 '12 at 15:37
Why are you doing this with an Excel file? Get your list into the database first, then run queries against it. — Aaron Bertrand, Jul 10 '12 at 15:39
@AaronBertrand - So there was another Database instance that I didn't have access to(i.e. I only have access to the backup database :\ ), which produced the long list for me. So I cannot reproduce the long 6700 within MSSQL — Caffeinated, Jul 10 '12 at 15:41
@usr I've had instance where DISTINCT worked out ~10 times slower, although admittedly I have no idea why this would be the case. — Braiba, Jul 10 '12 at 16:34

score 1 · Accepted Answer · answered Jul 10 '12 at 15:32

1

Well - for a list of IDs there isn't really another way to cherry pick a load of them

If the question is speed - then it's going to be as fast as the indexes allow it to be. Is RespondentID a clustered key/index?

If so it shouldn't be too bad as it will just be a load of index seeks - and the clustered index (if it is one) should cover this query

answered Jul 10 '12 at 15:32

Charleh

13,749
3
37
57

I'm not sure I understand the 'clustered index' terminology, let me read up! thnks – Caffeinated Jul 10 '12 at 15:34
A clustered index is usually the primary key on a table - i.e. is the RespondentID an index or primary key? – Charleh Jul 10 '12 at 15:34
OK I just checked - it's an index, because the first column(#1) is ID, then it's RespondentID , then another column called QuestionID – Caffeinated Jul 10 '12 at 15:38
Not sure by your reply but just checking - do you know the difference between an indexed column and non-indexed column? Just because something isn't the PK doesn't mean it's indexed – Charleh Jul 10 '12 at 15:41
I think I need to study that more. My intuition says indexed column is an int type. Not sure of it though – Caffeinated Jul 10 '12 at 15:43
1

No an indexed column is a column with an index on it - an index is basically a way for Sql to quickly find data. If you place an index which covers your query on the table the data access can be many times faster. When you run your query in Management Studio, there's an option you can turn on called 'Show Actual Execution Plan'. This shows you what SQL did to get the data, and the cost associated with the operations. In this execution plan you might see some table scans if indexes aren't used. Scans are generally not as quick as seeks (generally!). Might be worth reading up on indexes – Charleh Jul 10 '12 at 15:45
Thank you so much - That's very helpful , I will find the how Actual Execution Plan' and study indexes. Thank You! – Caffeinated Jul 10 '12 at 15:47
1

Have a look at this link as it explains indexes quite well: http://odetocode.com/Articles/70.aspx (it's a bit old but it gives you the basics). After looking at that, might be worth looking at the new stuff in SQL2005+ like included columns etc – Charleh Jul 10 '12 at 15:47

score 1 · Answer 2 · answered Jul 10 '12 at 15:41

1

In the above query, if you have QuestionID indexed, the server will find the rows where QuestionID is 113 using the index. It will then scan all rows where QuestionID is 113 and see if the RespondentID is in the list of IDs, using a binary search.

So, if there are 100 rows where QuestionID is 113, it will scan all 100 rows, but not the entire data set.

You could speed this up with a covering index, so that the server would use an index scan as opposed to a table scan. Simply create a compound "covering" index on QuestionID, RespondentID, and Condition, in that order.

There may be some limitation on the size of the covering index, depending on the size of the columns. If you post your schema, we can provide more detailed answers.

answered Jul 10 '12 at 15:41

Marcus Adams

53,009
9
91
143

MySql might do that but Sql 2008 might do something else! – Charleh Jul 10 '12 at 15:42
1

@Charleh, sorry, SQL 2008 works the same way with regard to indexes and covering indexes. I changed my answer. Thanks for catching that. – Marcus Adams Jul 10 '12 at 15:47
1

Haha yeah I know I was just poking you to see if you were awake :D – Charleh Jul 10 '12 at 15:53
Thank you very much Marcus - I will study up on covering indexes – Caffeinated Jul 10 '12 at 15:54

Is it feasible to use this brute-force method for this SQL query?

2 Answers2