2

I have a huge database of businesses (about 500,000) with zipcode, address etc . I need to display them by ascending order from 100 miles are of users zipcode. I have a table for zipcodes with related latitude and longitude. What will be faster/better solution ?

Case 1: to calculate distance and sort by distance. I will have users current zipcode, latitude and longitude in session. I will calculate distance using a SQL Server function.

Case 2: to get all zipcodes in 50 miles area and get businesses with all those zipcodes. Here I will have to write a select in nested query while finding businesses.

I think case 1 will calculate distance for all businesses in database. While 2nd case will just fetch zipcodes and will end up fetching only required businesses. Hence case 2 should be better? I would appreciate any suggestion here.

Here is LINQ query I have for case 1.

var businessListQuery = (from b in _DB.Businesses
                         let distance = _DB.CalculateDistance(b.Zipcode,userLattitude,userLogntitude)
                         where b.BusinessCategories.Any(bc => bc.SubCategoryId == subCategoryId)
                                         && distance < 100
                         orderby distance
                         select new BusinessDetails(b, distance.ToString()));

int totalRecords = businessListQuery.Count();
var ret = businessListQuery.ToList().Skip(startRow).Take(pageSize).ToList();

On a side note app is in C# .

Thanks

Conrad Frix
  • 51,984
  • 12
  • 96
  • 155
Pit Digger
  • 9,618
  • 23
  • 78
  • 122
  • 2
    Write the sql and run the execution plans & statistics and see what's faster. If performance is an issue, write a stored procedure and do it in the database and compare that also. – PMC Dec 15 '11 at 20:23
  • Create the index view and use Zipcode in your indexing then it will give you faster result try this way or may change your index base on your search and order. – KuldipMCA Dec 15 '11 at 20:37
  • simple and very fast method http://stackoverflow.com/questions/3983325/calculate-distance-between-zip-codes-and-users/3989830#3989830 – Jon Black Dec 15 '11 at 23:42
  • @f00 But in this case if I have to change distance it will has to repopulate table . – Pit Digger Dec 16 '11 at 15:05

1 Answers1

4

You could do worse than look at the GEOGRAPHY datatype, for example:

CREATE TABLE Places
(
    SeqID       INT IDENTITY(1,1),
    Place       NVARCHAR(20),
    Location    GEOGRAPHY
)
GO
INSERT INTO Places (Place, Location) VALUES ('Coventry', geography::Point(52.4167, -1.55, 4326))
INSERT INTO Places (Place, Location) VALUES ('Sheffield', geography::Point(53.3667, -1.5, 4326))
INSERT INTO Places (Place, Location) VALUES ('Penzance', geography::Point(50.1214, -5.5347, 4326))
INSERT INTO Places (Place, Location) VALUES ('Brentwood', geography::Point(52.6208, 0.3033, 4326))
INSERT INTO Places (Place, Location) VALUES ('Inverness', geography::Point(57.4760, -4.2254, 4326))
GO
SELECT p1.Place, p2.place, p1.location.STDistance(p2.location) / 1000 AS DistanceInKilometres
    FROM Places p1
    CROSS JOIN Places p2
GO  
SELECT p1.Place, p2.place, p1.location.STDistance(p2.location) / 1000 AS DistanceInKilometres
    FROM Places p1
        INNER JOIN Places p2 ON p1.SeqID > p2.SeqID
GO  

geography::Point takes the latitude and longitude as well as an SRID (Special Reference ID number). In this case, the SRID is 4326 which is standard latitude and longitude. As you already have latitude and longitude, you can just ALTER TABLE to add the geography column then UPDATE to populate it.

I've shown two ways to get the data out of the table, however you can't create an indexed view with this (indexed views can't have self-joins). You could though create a secondary table that is effectively a cache, that's populated based on the above. You then just have to worry about maintaining it (could be done through triggers or some other process).

Note that the cross join will give you 250,000,000,000 rows, but searching is simple as you only need look at one of the places columns (i.e., SELECT * FROM table WHERE Place1 = 'Sheffield' AND distance < 100, the second will give you significantly less rows, but the query then needs to consider both the Place1 and Place2 column).

Chris J
  • 30,688
  • 6
  • 69
  • 111