-1

I have a list of 6700 IDs, stored in an Excel file, that I want to search in the Database. I have a query like this:

select distinct RespondentID,  QuestionID,Condition
from RespondentProfiles (nolock) 
where RespondentID in (10286334,13988335, 18834444,26134155, 8658775)
and QuestionID in (113) 

And my IDs are the RespondentIDs above, they look like thus:

10286334,
13988335, 
18834444,
26134155,
 8658775,
 1616806,
 3470656,
30981782,
29302573,
29043779,
12899462,
 7420528,
17040183,
26129711,
19542507

Excel says that it can save as comma-separated values. But I wasn't able to do this(i.e I want a text file with all the values, comma-separated) then I'd cut & paste into the query. Which would look a bit ugly no-doubt but I think it's feasible, or is it too slow?

thanks!

Kevin Aenmey
  • 13,259
  • 5
  • 46
  • 45
Caffeinated
  • 11,982
  • 40
  • 122
  • 216
  • 1
    I've done this loads of times; it works fine. I'd take the DISTINCT out though; that's likely to slow it down. If those three fields aren't already unique (which I'd guess they are), doing a GROUP BY on them will still be faster. – Braiba Jul 10 '12 at 15:34
  • @Braiba - OK cool, I'll use GROUP BY rather than DISTINCT – Caffeinated Jul 10 '12 at 15:35
  • 1
    @Braiba, distinct is exactly as fast as group by. It is the same. – usr Jul 10 '12 at 15:35
  • Hmm, perhaps this is true usr - http://stackoverflow.com/questions/581521/whats-faster-select-distinct-or-group-by-in-mysql – Caffeinated Jul 10 '12 at 15:37
  • 2
    Why are you doing this with an Excel file? Get your list into the database first, then run queries against it. – Aaron Bertrand Jul 10 '12 at 15:39
  • @AaronBertrand - So there was another Database instance that I didn't have access to(i.e. I only have access to the backup database :\ ), which produced the long list for me. So I cannot reproduce the long 6700 within MSSQL – Caffeinated Jul 10 '12 at 15:41
  • 1
    @usr I've had instance where DISTINCT worked out ~10 times slower, although admittedly I have no idea why this would be the case. – Braiba Jul 10 '12 at 16:34

2 Answers2

1

Well - for a list of IDs there isn't really another way to cherry pick a load of them

If the question is speed - then it's going to be as fast as the indexes allow it to be. Is RespondentID a clustered key/index?

If so it shouldn't be too bad as it will just be a load of index seeks - and the clustered index (if it is one) should cover this query

Charleh
  • 13,749
  • 3
  • 37
  • 57
  • I'm not sure I understand the 'clustered index' terminology, let me read up! thnks – Caffeinated Jul 10 '12 at 15:34
  • A clustered index is usually the primary key on a table - i.e. is the RespondentID an index or primary key? – Charleh Jul 10 '12 at 15:34
  • OK I just checked - it's an index, because the first column(#1) is ID, then it's RespondentID , then another column called QuestionID – Caffeinated Jul 10 '12 at 15:38
  • Not sure by your reply but just checking - do you know the difference between an indexed column and non-indexed column? Just because something isn't the PK doesn't mean it's indexed – Charleh Jul 10 '12 at 15:41
  • I think I need to study that more. My intuition says indexed column is an int type. Not sure of it though – Caffeinated Jul 10 '12 at 15:43
  • 1
    No an indexed column is a column with an index on it - an index is basically a way for Sql to quickly find data. If you place an index which covers your query on the table the data access can be many times faster. When you run your query in Management Studio, there's an option you can turn on called 'Show Actual Execution Plan'. This shows you what SQL did to get the data, and the cost associated with the operations. In this execution plan you might see some table scans if indexes aren't used. Scans are generally not as quick as seeks (generally!). Might be worth reading up on indexes – Charleh Jul 10 '12 at 15:45
  • Thank you so much - That's very helpful , I will find the how Actual Execution Plan' and study indexes. Thank You! – Caffeinated Jul 10 '12 at 15:47
  • 1
    Have a look at this link as it explains indexes quite well: http://odetocode.com/Articles/70.aspx (it's a bit old but it gives you the basics). After looking at that, might be worth looking at the new stuff in SQL2005+ like included columns etc – Charleh Jul 10 '12 at 15:47
1

In the above query, if you have QuestionID indexed, the server will find the rows where QuestionID is 113 using the index. It will then scan all rows where QuestionID is 113 and see if the RespondentID is in the list of IDs, using a binary search.

So, if there are 100 rows where QuestionID is 113, it will scan all 100 rows, but not the entire data set.

You could speed this up with a covering index, so that the server would use an index scan as opposed to a table scan. Simply create a compound "covering" index on QuestionID, RespondentID, and Condition, in that order.

There may be some limitation on the size of the covering index, depending on the size of the columns. If you post your schema, we can provide more detailed answers.

Marcus Adams
  • 53,009
  • 9
  • 91
  • 143