2

What would be the best way to structure a database and mvc app for the fastest look ups possible when trying to retrieve a list of items within a certain radius (in miles) and also depending on the item's zip code. I am open to suggestions even if it involves change to my current set up.

Simple Schema

I am currently using a .txt file that contains zip codes and lat/lon coordinates to calculate distances. With the use of the Haversine Formula you can calculate distance between two points using the coordinates.

I am using PagedList 1.17.0.0 to page my data on my index controller. The main problem I am facing pertains to EF. When selecting items within a 50 mile distance my SQL statements become too nested and throw errors in EF.

Current Flow of controller

public ActionResult Index(string sortOrder, string Categories, string Manufacturers, int? page, bool clearFilters = false
         ,string DistanceLimit = "", int Vehicle = 0)  

Each string in the parameters section is a possible filter for EF to use. Since I cannot tell EF to object.Where(P => P.Distance <= 50)

I generate a raw query using a string builder. My ZipCode class retrieves all zips within a 50 mile radius and then constructs a simple select with each zip code as a parameter in the where clause. This is where the problem occurs due to the fact that there are too many where clauses. 25 mile radius is fine, but not 50.

In the event that the above functionality works it fetches primary keys of items within that distance. I then have EF select Item.Include(P => P.Manufacturer).Include(P => P.Category).Include(3 other tables)

I then call ToList() on my IQueryable and remove any Items that are not within my previous list of PKs.

Finally, I call ToPagedList() and return that to my view within a viewmodel.

As you can see there is alot going on here.

For One: it does not work past a 25 miles radius.

2: it is overly complex and feels like a Rube Goldberg machine.

3: All these separate query executions significantly impact search result speed.

As I said earlier, I am open to any suggestions from anyone who has made a system with similar functionality. I have never dealt with geospatial data and am recently aware of SQL Server's geography type; however, from my understanding, it seems that I would have to call StDistance on every lat/lon record in order to figure out which items are near. Not to mention I will have to move my .txt file to the dB.

Adrian
  • 3,332
  • 5
  • 34
  • 52
  • 1
    Check out table-valued parameters for SQL: http://stackoverflow.com/questions/5595353/how-to-pass-table-value-parameters-to-stored-procedure-from-net-code. You can simply join to a table of zip codes instead of adding where clauses. – Keith Payne Jan 17 '14 at 23:09
  • How often does the the Item table change? Is it fairly static? How many rows in it? – simon at rcl Jan 22 '14 at 18:10
  • @simonatrcl the item's table changes alot. It is basically a catalog of items for sale. I figured out a solution using table valued parameters that I will post a little bit later. – Adrian Jan 28 '14 at 17:44

1 Answers1

1

I had to deal with same problem in my company when trying to retrieve items based on the radius. The only difference for our data we already store Lat, Long in database for each item that we need. In some cases we are starting to use GeographyPoint data type to remove additional workload on the server during the return. Some of our tools are unable to create Geography data type so its use is limited.

This is sample of the stored procedure that we use to do this. Look at the comments.

CREATE PROCEDURE [dbo].GetItems_ByRadius
(
 @pLat DECIMAL(20, 13)      --= 35.151
,@pLon DECIMAL(20, 13)      --= -86.59
,@pRadius DECIMAL(7, 2)     --= 2
)
AS
BEGIN
    SET NOCOUNT ON;
    /*Declare Local Variables to avoid parameter sniffing*/
    DECLARE @Lat VARCHAR(20) = @pLat
       ,@Lon VARCHAR(20) = @pLon
       ,@Radius DECIMAL(7, 2) = @pRadius
       ,@Earth_Radius INT = 6371000

    /*Declare additional variables that are needed for calculations*/   
    DECLARE @Distance DECIMAL(10, 2) = @Radius * 1609.344
       ,@Point_geo GEOGRAPHY
       ,@Min_Lat DECIMAL(20, 13)
       ,@Max_Lat DECIMAL(20, 13)
       ,@Min_Long DECIMAL(20, 13)
       ,@Max_Long DECIMAL(20, 13)

    /*Convert original Lat Long parameters to GeographyPoint this will be used to check radius distance*/
    SET @Point_geo = GEOGRAPHY::STGeomFromText('POINT(' + @Lon + ' ' + @Lat + ')', 4326)

    /*Build Bounding Box*/           
    SET @Min_Lat = @Lat - DEGREES(@distance / @Earth_Radius)
    SET @Max_Lat = @Lat + DEGREES(@distance / @Earth_Radius)    
    SET @Min_Long = @Lon - DEGREES(@distance / @Earth_Radius / COS(RADIANS(@Lat)))
    SET @Max_Long = @Lon + DEGREES(@distance / @Earth_Radius / COS(RADIANS(@Lat)));

    WITH    MyBoxResults
              AS ( SELECT *
                       ,GEOGRAPHY::STPointFromText('POINT(' + CAST(Geog_Long AS VARCHAR(20)) + ' '
                                                   + CAST(Geog_Lat AS VARCHAR(20)) + ')', 4326) AS GeogLocation
                    FROM MyTable
                    WHERE ( Geog_Lat BETWEEN @Min_Lat AND @Max_Lat )
                        AND ( Geog_Long BETWEEN @Min_Long AND @Max_Long )) \
/*Using Long and Lat decimal columns we get results that are within the box that surrounds radius*/
        SELECT *
            FROM MyBoxResults
            WHERE @Point_geo.STDistance(GeogLocation) <= @Distance; 
/*This further limits results to only the radius instead of original box.*/

END

Within procedure we made several ways to speed up performance.
- Use of local variables prevent Parameter Sniffing Read here more about it
- Use CTE to limit results to a bounding box that can be done fast with indexes on Lat and Long columns. - Using Geography functions and in perticular the STDistance only when needed. If that function is used on whole data set instead of limited results, procedure takes much longer to return.

Let me know if you have any questions about the procedure.