1

Given this the scenario:

  1. We have the order of 1,000,000 points around the world, specified by longitude and latitude;
  2. We have a circle c based on a point pc (specified by longitude and latitude) and a radius rc
  3. We want to efficiently determine which of the points are in the circle

I'm developing in C# and the locations stored in SQL server 2008.

So as I see it I have these 3 options:

  1. Store the locations as longitude latitude floats and perform the calculations in C#.

  2. Store the locations as geographical data types and perform the calculations in SQL server 2008 like this:

    CREATE TABLE UserLocations
    [UserId] [bigint] NOT NULL,
    [CurrentLocation] [geography] NOT NULL
    
    ALTER PROCEDURE sp_GetCurrentUsersInRange
    @userPoint geography, 
    @RangeInMeters int
    AS
    BEGIN
    
    select  UserId from UserLocations
    where @userPoint.STDistance(CurrentLocation) <= @RangeInMeters
    and UserId <> @userId
    
    END
    

    Disadvantages: problems using geographical data with LinqToSQL and LinqToEntities.

    Advantages: using dbms processing power over large data, and usage of the SQL Server spatial index.

3.Using some web service such as google's geolocation and calculation services. So far I didn't find such web service.

Which is the most efficient in your opinion? Please justify your answer.

Thank you

ozba
  • 6,522
  • 4
  • 33
  • 40
  • [Equation for testing if a point is inside a circle](http://stackoverflow.com/questions/481144/equation-for-testing-if-a-point-is-inside-a-circle) – Magnus Apr 18 '12 at 14:33
  • 1
    @Magnus we're on the surface of a sphere here, not on a plane – AakashM Apr 18 '12 at 15:30

2 Answers2

1

My naive approach would be to define a lat/long bounding box around the point pc and select from the database using BETWEEN on those box coordinates. Statistically around 79% of the points passing that test will be within the circle. A simple check in the code will weed out the ones outside the circle.

I say naive because I'm not familiar with the geometry capabilities of SQL Server.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • Too Naive: 1. 21% full scan is not efficient. 2. Read about the SQL server 2008 Advantages I added in the question – ozba Apr 19 '12 at 15:24
  • I wouldn't dismiss @Mark Ransom's suggestion so readily. From experience, when dealing with simple point-distance queries as you are, you'll probably find a solution like Mark's will outperform the geography datatype, even with use of a spatial index. SQL Server's spatial datatypes really excel when you are dealing with complex geometries like LineStrings or Polygons, or when you want more complex topological tests like STContains(), STCrosses(), or STRelate(), but for straightforward point-to-point distance calculations you can get better performance using logic like Mark's. – Alastair Aitchison Apr 19 '12 at 16:43
  • @ozba, who said anything about a full scan? If the latitude and longitude in your database are indexed, the server can quickly find the intersection of the two ranges. The 21% figure only applies to the small subset of the DB returned by the query. – Mark Ransom Apr 19 '12 at 17:00
  • Yet again, the spatial indexing on the geographic data type give better performance. see here: [SQL Server 2008 Spatial Indexing performance](http://www.mssqltips.com/sqlservertip/1976/sql-server-2008-spatial-index-performance) – ozba Apr 19 '12 at 20:47
  • @MarkRansom can you please give a code describing your "simple check" suggestion? So we can discuss the efficiency of it. – ozba Apr 19 '12 at 21:03
  • @ozba - the article you linked to compares the performance *of a query using the geography datatype* with or without a spatial index. Clearly a spatial index is likely to improve performance in that case. What Mark is suggesting is to perform an initial filter of results using a BETWEEN predicate on two regular numeric columns - i.e. lat and long - using a nonclustered index. I.e. only use the spatial methods to perform the high accuracy check of the candidate points actually contained in the circle. – Alastair Aitchison Apr 20 '12 at 10:46
0

An alternate to using a geometric circle, you could SELECT all records within a certain distance (using STDistance) from the circle's center. But I don't know whether or not it would be faster or slower than the intersection solution you have listed.

If the 100,000 points are static, you could probably code something in C#, loading the list into memory and using bounding boxes to minimize the usage of the distance calculation (ie, Haversine). It would probably be faster because you're minimizing I/O.

However, if the points are not static (or you have them stored in SQL Server), I'd opt for using SQL Server, it's a lot easier. You'd definitely want to create the proper spatial indices. SQL Server's spacial indexing is pretty good, you may find that it can even out-perform the in-memory solution I listed above.

I haven't used LINQ for this type of work, I usually do it old-school with a SqlConnection and a Reader. I have read that LINQ mixed with Spatials is a problem.

I don't know about Google, do they have such a web service?

Marc Bernier
  • 2,928
  • 27
  • 45
  • I would really appreciate If someone knows if Google have such a web service, cause I searched for it and didn't find something that answer the required scenario. – ozba Apr 18 '12 at 16:51
  • If the 100,000 points are yours (ie, not common landmarks, etc), then I doubt there is such a web service. This is more of a database task than a web service task. – Marc Bernier Apr 18 '12 at 17:12
  • If I could store them via the web service and query them afterwards, Than it is suitable – ozba Apr 18 '12 at 18:16