Selecting records with a huge "where data set"

Question

Background Info

C#
MS MVC 4
Sql Azure
Linq - Identities

Problem at hand:

Selecting records in an Items table where zip code is within a certain range of miles.

Items Table

id (PK)
Title
Body
ZipCode (Int)

Summary of Progress:

I have a class which uses the 2013 US Gazatteer zip code and tabulation areas to gather zip codes and assess distances between zip codes. It is basically a .csv/.txt file that I open into a stream and convert to POCOs in order to process distances. That much of the equation is working fine; however, selecting a list of Items from an Items table based on this list of zip codes is where I'm not sure what to do.

Scenario

User A wants to search for items within a 25 miles radius of area code 46324. User A hits search and in the background my class returns a list of 124 zip codes within a 25 mile radius.

Question: What is the best way (performance wise) to retrieve items in my Item table using this list of zipcodes?

Possible Solutions

I thought about creating a dynamic query using the tsql in keyword within my where clause and simply supplying this list as the where parameters. This does not seem to be a very performance oriented way of doing this; however, considering my current architecture I do not see any other way.

I also thought about incorporating a sort of paging functionality that will only take the first 5 zip codes to return results followed by the next 5 and so on and so on. This will involve more work but it definitely would seem to be a better performance choice.

Any ideas?

Where does the 'huge' mentioned in the title come from? Where is the bottle neck exactly? — NoChance, Jan 05 '14 at 01:27
If someone were to search for items within a 100 miles radius it could return thousands of zipcodes. From my understanding, having too strict of a where clause is bad for performance. I can't imagine a query with a list of 1000 where values to be quick. Or am I wrong? — Adrian, Jan 05 '14 at 01:30
You either accept the fact that the user can ask for the large information or not. This is a business decision. However, in general, it should have an upper boundary or you may have to find out additional filtering criteria. It is not uncommon to return 1000s of rows but the number alone is not indicative, other factors such as how many queries are processed at the time your query runs, what is the actual data size, etc. For 1000+ rows or so, paging to be considered in reasonable chuncks (5 is too low). However paging also loads the server. Again, the number of concurrent users is a factor. — NoChance, Jan 05 '14 at 01:40
I feel like perhaps you misunderstood exactly what I was aiming it. Let me try to explain further. It's not the query's return of 1000 rows that I am worried about its the fact of having a query with a where clause that might look something like this `WHERE ZipCode IN ('46321', '46323', '60438', '46327', '60409', ......'n zip code')` Lets say there are 1000 zip codes in that where clause. Would the performance not suffer because of that? I have never had to deal with such a specific where clause and am worried this might not be the correct route to take. — Adrian, Jan 05 '14 at 01:49
OK, I see, In fact I don't know, although I think it would have a negligible effect. However, I think there is a limit to how long a sql string may be:http://stackoverflow.com/questions/1869753/maximum-size-for-a-sql-server-query-in-clause-is-there-a-better-approach — NoChance, Jan 05 '14 at 01:56
Seem and imagine. Did you test? Why would you think multiple queries would be faster than one? — paparazzo, Jan 05 '14 at 09:57

score 0 · Accepted Answer · answered Mar 23 '14 at 12:16

I stumbled across your question purely by chance searching for something else, and also I see it's quite old, but I thought I'd give you a comment none the less:

What I would do in this case is actually allow the database to do the search and the C# to do the calcs. You have a class in C# which calculates the distances? Then why not save the distance from each zip code to each zip code in a "lookup table" in sql.

Doing it this way makes sure that the data is calculated once but you let the sql find the right data for you.

ie:

Create a table with from_zip, to_zip, distance fields
Calculate and populate table once at the beginning
Query by saying "select * from zip_lookup where zip_from = bla and distance between 0 and 100" or something like that

Hmm, yet another interesting idea on how to do things. I did manage to get this working; however, the way I had things setup it did not allow to order by distance. I think this might solve that problem. — Adrian, Mar 23 '14 at 19:05