107

I am redesigning a customer database and one of the new pieces of information I would like to store along with the standard address fields (Street, City, etc.) is the geographic location of the address. The only use case I have in mind is to allow users to map the coordinates on Google maps when the address cannot otherwise be found, which often happens when the area is newly developed, or is in a remote/rural location.

My first inclination was to store latitude and longitude as decimal values, but then I remembered that SQL Server 2008 R2 has a geography data type. I have absolutely no experience using geography, and from my initial research, it looks to be overkill for my scenario.

For example, to work with latitude and longitude stored as decimal(7,4), I can do this:

insert into Geotest(Latitude, Longitude) values (47.6475, -122.1393)
select Latitude, Longitude from Geotest

but with geography, I would do this:

insert into Geotest(Geolocation) values (geography::Point(47.6475, -122.1393, 4326))
select Geolocation.Lat, Geolocation.Long from Geotest

Although it's not that much more complicated, why add complexity if I don't have to?

Before I abandon the idea of using geography, is there anything I should consider? Would it be faster to search for a location using a spatial index vs. indexing the Latitude and Longitude fields? Are there advantages to using geography that I am not aware of? Or, on the flip side, are there caveats that I should know about which would discourage me from using geography?


Update

@Erik Philips brought up the ability to do proximity searches with geography, which is very cool.

On the other hand, a quick test is showing that a simple select to get the latitude and longitude is significantly slower when using geography (details below). , and a comment on the accepted answer to another SO question on geography has me leery:

@SaphuA You're welcome. As a sidenote be VERY carefull of using a spatial index on a nullable GEOGRAPHY datatype column. There are some serious performance issue, so make that GEOGRAPHY column non-nullable even if you have to remodel your schema. – Tomas Jun 18 at 11:18

All in all, weighing the likelihood of doing proximity searches vs. the trade-off in performance and complexity, I've decided to forgo the use of geography in this case.


Details of the test I ran:

I created two tables, one using geography and another using decimal(9,6) for latitude and longitude:

CREATE TABLE [dbo].[GeographyTest]
(
    [RowId] [int] IDENTITY(1,1) NOT NULL,
    [Location] [geography] NOT NULL,
    CONSTRAINT [PK_GeographyTest] PRIMARY KEY CLUSTERED ( [RowId] ASC )
) 

CREATE TABLE [dbo].[LatLongTest]
(
    [RowId] [int] IDENTITY(1,1) NOT NULL,
    [Latitude] [decimal](9, 6) NULL,
    [Longitude] [decimal](9, 6) NULL,
    CONSTRAINT [PK_LatLongTest] PRIMARY KEY CLUSTERED ([RowId] ASC)
) 

and inserted a single row using the same latitude and longitude values into each table:

insert into GeographyTest(Location) values (geography::Point(47.6475, -122.1393, 4326))
insert into LatLongTest(Latitude, Longitude) values (47.6475, -122.1393)

Finally, running the following code shows that, on my machine, selecting the latitude and longitude is approximately 5 times slower when using geography.

declare @lat float, @long float,
        @d datetime2, @repCount int, @trialCount int, 
        @geographyDuration int, @latlongDuration int,
        @trials int = 3, @reps int = 100000

create table #results 
(
    GeographyDuration int,
    LatLongDuration int
)

set @trialCount = 0

while @trialCount < @trials
begin

    set @repCount = 0
    set @d = sysdatetime()

    while @repCount < @reps
    begin
        select @lat = Location.Lat,  @long = Location.Long from GeographyTest where RowId = 1
        set @repCount = @repCount + 1
    end

    set @geographyDuration = datediff(ms, @d, sysdatetime())

    set @repCount = 0
    set @d = sysdatetime()

    while @repCount < @reps
    begin
        select @lat = Latitude,  @long = Longitude from LatLongTest where RowId = 1
        set @repCount = @repCount + 1
    end

    set @latlongDuration = datediff(ms, @d, sysdatetime())

    insert into #results values(@geographyDuration, @latlongDuration)

    set @trialCount = @trialCount + 1

end

select * 
from #results

select avg(GeographyDuration) as AvgGeographyDuration, avg(LatLongDuration) as AvgLatLongDuration
from #results

drop table #results

Results:

GeographyDuration LatLongDuration
----------------- ---------------
5146              1020
5143              1016
5169              1030

AvgGeographyDuration AvgLatLongDuration
-------------------- ------------------
5152                 1022

What was more surprising is that even when no rows are selected, for example selecting where RowId = 2, which doesn't exist, geography was still slower:

GeographyDuration LatLongDuration
----------------- ---------------
1607              948
1610              946
1607              947

AvgGeographyDuration AvgLatLongDuration
-------------------- ------------------
1608                 947
Community
  • 1
  • 1
Jeff Ogata
  • 56,645
  • 19
  • 114
  • 127
  • 4
    I'm thinking of doing both, save the Lat and Lon in their own columns, and have another column for a Geography object, so if I just need the Lat/Lon I take them from the columns, and if I need proximity search I'll use the Geography. Is this wise? Are there any downsides (other than it takes more space...)? – Yuval A. Oct 18 '12 at 19:58
  • @YuvalA. that certainly sounds reasonable, and may be a good compromise. The only concern I have off the top of my head is whether having the Geography column in the table has any impact on queries against the table - I have no experience with that so you would need to test to verify. – Jeff Ogata Oct 22 '12 at 16:10
  • 1
    Why did you keep updating your question with new questions instead of asking new questions? – Chad Dec 06 '12 at 16:25
  • @Chad not sure what you mean. I updated the body of the question once, and it wasn't to ask more questions. – Jeff Ogata Dec 06 '12 at 16:29
  • 6
    It's worth noting, now, for those finding this question, that SQL Server 2012 includes significant performance increases with spatial indexing. Also of note is the fact that as long as you are storing location information, you can add in spatial information later using a lookup service to geocode your already stored addresses. – Volvox Jun 04 '13 at 04:46
  • @YuvalA there is not point in storing Lat and Long in there own columns because the sql for getting lat & long on a geography column storing a point is `select geo.Lat, geo.Long from table1`. – Erik Philips Oct 09 '13 at 07:00

3 Answers3

66

If you plan on doing any spatial computation, EF 5.0 allows LINQ Expressions like:

private Facility GetNearestFacilityToJobsite(DbGeography jobsite)
{   
    var q1 = from f in context.Facilities            
             let distance = f.Geocode.Distance(jobsite)
             where distance < 500 * 1609.344     
             orderby distance 
             select f;   
    return q1.FirstOrDefault();
}

Then there is a very good reason to use Geography.

Explanation of spatial within Entity Framework.

Updated with Creating High Performance Spatial Databases

As I noted on Noel Abrahams Answer:

A note on space, each coordinate is stored as a double-precision floating-point number that is 64 bits (8 bytes) long, and 8-byte binary value is roughly equivalent to 15 digits of decimal precision, so comparing a decimal(9,6) which is only 5 bytes, isn't exactly a fair comparison. Decimal would have to be a minimum of Decimal(15,12) (9 bytes) for each LatLong (total of 18 bytes) for a real comparison.

So comparing storage types:

CREATE TABLE dbo.Geo
(    
geo geography
)
GO

CREATE TABLE dbo.LatLng
(    
    lat decimal(15, 12),   
    lng decimal(15, 12)
)
GO

INSERT dbo.Geo
SELECT geography::Point(12.3456789012345, 12.3456789012345, 4326) 
UNION ALL
SELECT geography::Point(87.6543210987654, 87.6543210987654, 4326) 

GO 10000

INSERT dbo.LatLng
SELECT  12.3456789012345, 12.3456789012345 
UNION
SELECT 87.6543210987654, 87.6543210987654

GO 10000

EXEC sp_spaceused 'dbo.Geo'

EXEC sp_spaceused 'dbo.LatLng'

Result:

name    rows    data     
Geo     20000   728 KB   
LatLon  20000   560 KB

The geography data-type takes up 30% more space.

Additionally the geography datatype is not limited to only storing a Point, you can also store LineString, CircularString, CompoundCurve, Polygon, CurvePolygon, GeometryCollection, MultiPoint, MultiLineString, and MultiPolygon and more. Any attempt to store even the simplest of Geography types (as Lat/Long) beyond a Point (for example LINESTRING(1 1, 2 2) instance) will incur additional rows for each point, a column for sequencing for the order of each point and another column for grouping of lines. SQL Server also has methods for the Geography data types which include calculating Area, Boundary, Length, Distances, and more.

It seems unwise to store Latitude and Longitude as Decimal in Sql Server.

Update 2

If you plan on doing any calculations like distance, area, etc, properly calculating these over the surface of the earth is difficult. Each Geography type stored in SQL Server is also stored with a Spatial Reference ID. These id's can be of different spheres (the earth is 4326). This means that the calculations in SQL Server will actually calculate correctly over the surface of the earth (instead of as-the-crow-flies which could be through the surface of the earth).

enter image description here

Community
  • 1
  • 1
Erik Philips
  • 53,428
  • 11
  • 128
  • 150
  • 1
    To add to this information, using Geography truely expands the ability of sql searches from is a lat/long in between other lat/longs (usually just rectangles) because the Geography data type allows you to create multiple regions of almost any size and shape. – Erik Philips Sep 14 '11 at 04:59
  • 1
    thanks again. I did ask for reasons to consider using `geography` and you provided some good ones. Ultimately, I decided to just use `decimal` fields in this case (see my long-winded update), but it's good to know that I can use `geography` if I ever need to do anything fancier than simply mapping coordinates. – Jeff Ogata Sep 14 '11 at 18:05
6

Another thing to consider is the storage space taken up by each method. The geography type is stored as a VARBINARY(MAX). Try running this script:

CREATE TABLE dbo.Geo
(
    geo geography

)

GO

CREATE TABLE dbo.LatLon
(
    lat decimal(9, 6)
,   lon decimal(9, 6)

)

GO

INSERT dbo.Geo
SELECT geography::Point(36.204824, 138.252924, 4326) UNION ALL
SELECT geography::Point(51.5220066, -0.0717512, 4326) 

GO 10000

INSERT dbo.LatLon
SELECT  36.204824, 138.252924 UNION
SELECT 51.5220066, -0.0717512

GO 10000

EXEC sp_spaceused 'dbo.Geo'
EXEC sp_spaceused 'dbo.LatLon'

Result:

name    rows    data     
Geo     20000   728 KB   
LatLon  20000   400 KB

The geography data-type takes up almost twice as much space.

  • 2
    A note on space, [each coordinate is stored as a double-precision floating-point number that is 64 bits (8 bytes) long, and 8-byte binary value is roughly equivalent to 15 digits of decimal precision](http://alastaira.wordpress.com/2011/03/28/sql-server-spatial-coordinate-storage-precision/), so comparing a [decimal(9,6) which is only 5 bytes](http://msdn.microsoft.com/en-us/library/ms187746.aspx), isn't exactly a fair comparison. Decimal would have to be a minimum of Decimal(15,12) (9 bytes) for each LatLong (total of 18 bytes) for a real comparison. – Erik Philips Oct 09 '13 at 06:15
  • 9
    @ErikPhilips the point is why use a decimal(15, 12) when all you need is a decimal(9, 6)? The comparison above is a practical one - not an academic exercise. – Zephyr was a Friend of Mine Oct 09 '13 at 10:09
-1
    CREATE FUNCTION [dbo].[fn_GreatCircleDistance]
(@Latitude1 As Decimal(38, 19), @Longitude1 As Decimal(38, 19), 
            @Latitude2 As Decimal(38, 19), @Longitude2 As Decimal(38, 19), 
            @ValuesAsDecimalDegrees As bit = 1, 
            @ResultAsMiles As bit = 0)
RETURNS decimal(38,19)
AS
BEGIN
    -- Declare the return variable here
    DECLARE @ResultVar  decimal(38,19)

    -- Add the T-SQL statements to compute the return value here
/*
Credit for conversion algorithm to Chip Pearson
Web Page: www.cpearson.com/excel/latlong.aspx
Email: chip@cpearson.com
Phone: (816) 214-6957 USA Central Time (-6:00 UTC)
Between 9:00 AM and 7:00 PM

Ported to Transact SQL by Paul Burrows BCIS
*/
DECLARE  @C_RADIUS_EARTH_KM As Decimal(38, 19)
SET @C_RADIUS_EARTH_KM = 6370.97327862
DECLARE  @C_RADIUS_EARTH_MI As Decimal(38, 19)
SET @C_RADIUS_EARTH_MI = 3958.73926185
DECLARE  @C_PI As Decimal(38, 19)
SET @C_PI =  pi()

DECLARE @Lat1 As Decimal(38, 19)
DECLARE @Lat2 As Decimal(38, 19)
DECLARE @Long1 As Decimal(38, 19)
DECLARE @Long2 As Decimal(38, 19)
DECLARE @X As bigint
DECLARE @Delta As Decimal(38, 19)

If @ValuesAsDecimalDegrees = 1 
Begin
    set @X = 1
END
Else
Begin
    set @X = 24
End 

-- convert to decimal degrees
set @Lat1 = @Latitude1 * @X
set @Long1 = @Longitude1 * @X
set @Lat2 = @Latitude2 * @X
set @Long2 = @Longitude2 * @X

-- convert to radians: radians = (degrees/180) * PI
set @Lat1 = (@Lat1 / 180) * @C_PI
set @Lat2 = (@Lat2 / 180) * @C_PI
set @Long1 = (@Long1 / 180) * @C_PI
set @Long2 = (@Long2 / 180) * @C_PI

-- get the central spherical angle
set @Delta = ((2 * ASin(Sqrt((power(Sin((@Lat1 - @Lat2) / 2) ,2)) + 
    Cos(@Lat1) * Cos(@Lat2) * (power(Sin((@Long1 - @Long2) / 2) ,2))))))

If @ResultAsMiles = 1 
Begin
    set @ResultVar = @Delta * @C_RADIUS_EARTH_MI
End
Else
Begin
    set @ResultVar = @Delta * @C_RADIUS_EARTH_KM
End

    -- Return the result of the function
    RETURN @ResultVar

END
  • 2
    New answers are always welcome, but please add some context. Briefly explaining how the above solves the issue makes the answer more useful to others. – Leigh Mar 31 '17 at 01:25