12

I'm working on moving some spatial searching capabilities from Postgres with PostGIS to SQL Server and I'm seeing some pretty terrible performance, even with indexes.

My data is around a million points, and I want to find out which of those points are within given shapes, so the query looks something like this:

DECLARE @Shape GEOMETRY = ...
SELECT * FROM PointsTable WHERE Point.STWithin(@Shape) = 1

If I select a fairly small shape, I can sometimes get sub-second times, but if my shape is fairly large (which they sometimes are), I can get times over 5 minutes. If I run the same searches in Postgres, they're always under a second (in fact, almost all are under 200 ms).

I've tried several different grid sizes on my indexes (all high, all medium, all low), different cells per object (16, 64, 256), and no matter what I do the times stay fairly constant. I'd like to try more combinations but I don't even know what direction to go. More cells per object? Less? Some strange combination of grid sizes?

I've looked at my query plans and they're always using the index, it's just not helping at all. I've even tried without the index, and it's not much worse.

Is there any advice anyone can give on this? Everything I can find suggests "we can't give you any advice on indexes, just try everything and maybe one will work", but with it taking 10 minutes to create an index, doing this blindly is a massive waste of time.

EDIT: I also posted this on a Microsoft forum. Here's some information they asked for on there:

The best working index I could get was this one:

CREATE SPATIAL INDEX MapTesting_Location_Medium_Medium_Medium_Medium_16_NDX
    ON MapTesting (Location)
 USING GEOMETRY_GRID
  WITH (
    BOUNDING_BOX = ( -- The extent of our data, data is clustered in cities, but this is about as small as the index can be without missing thousands of points
        XMIN = -12135832,
        YMIN = 4433884,
        XMAX = -11296439,
        YMAX = 5443645),
    GRIDS = (
        LEVEL_1 = MEDIUM,
        LEVEL_2 = MEDIUM,
        LEVEL_3 = MEDIUM,
        LEVEL_4 = MEDIUM),
     CELLS_PER_OBJECT = 256 -- This was set to 16 but it was much slower
  )

I had some issues getting the index used, but this is different.

For these tests I ran a test search (the one listed in my original post) with a WITH(INDEX(...)) clause for each of my indexes (testing various settings for grid size and cells per object), and one without any hint. I also ran sp_help_spatial_geometry_index using each index and the same search shape. The index listed above ran fastest and also was listed as most efficient in sp_help_spatial_geometry_index.

When running the search I get these statistics:

(1 row(s) affected)
Table 'MapTesting'. Scan count 0, logical reads 361142, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'extended_index_592590491_384009'. Scan count 1827, logical reads 8041, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 6735 ms,  elapsed time = 13499 ms.

I also tried using random points as data (since I can't give out our real data), but it turns out that this search is really fast with random data. This lead us to believe that our problem is how the grid system works with our data.

Our data is addresses across the entire state, so there are a few very high density regions, but mostly sparse data. I think the problem is that no setting for the grid sizes works well for both. With grids set to HIGH, the index returns too many cells in low-density areas, and with grids set to LOW, the grids are useless in high density areas (at MEDIUM, it's not as bad, but still not good at either).

I am able to get the index used, it's just not helping. Every test was run with "show actual execution plan" turned on, and it always shows the index.

Brendan Long
  • 53,280
  • 21
  • 146
  • 188
  • +1: I would like to know too. – Nordic Mainframe Aug 12 '10 at 18:01
  • Is your million-points-of-data stored as geometry (flat, euclidean space) or geography (round, earthy-shaped coordinates)? If you are mixing geometry and geography, you're going to introduce a performance hit for the conversion math. – mwalker Aug 24 '10 at 17:47
  • @Mwalker, It's all geometry, and I don't think you can mix and match them anyway. – Brendan Long Aug 24 '10 at 19:21
  • Thanks for the comment about CELLS_PER_OBJECT. I was getting 6.8 seconds using CELLS_PER_OBJECT = 16 and levels = MEDIUM, and i changed CELLS_PER_OBJECT to 256 and levels = HIGH, and time dropped to 2.8 seconds :-) – Guilherme Campos Hazan Mar 03 '18 at 21:08

8 Answers8

5

I've just spent the day on a similar problem. In particular, we are doing a point-in-polygon type of query, where there was a relatively small set of polygons, but each polygon was large and complex.

Solution turned out to be as follows, for the spatial index on the polygon table:

  1. Use "geometry auto grid" instead of the old MMLL etc. This gives 8 levels of indexing instead of the old 4, and the settings are automatic. AND...
  2. Set 'cells per object' to 2000 or 4000. (Not an easy thing to guess, given that the default is 16!)

This made a huge difference. It was 10 times faster than a spatial index in the default configuration, and 60 times faster than no index at all.

John Rusk - MSFT
  • 613
  • 5
  • 10
3

Here are some remarks about SQL-Server's spatial extensions and how to ensure that the index is efficiently used:

Apparently, the planner has difficulties to build a good plan if he does not know the actual geometry during parse time. The autor suggest to insert exec sp_executesql:

Replace:

-- does not use the spatial index without a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
select a.id, a.shape.STAsText() 
from zipcodes a 
where a.shape.STIntersects(@latlonPoint)=1
go

with:

-- this does use the spatial index without using a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
exec sp_executesql 
N'select a.id, a.shape.STAsText() 
from zipcodes a 
where a.shape.STIntersects(@latlonPoint)=1', N'@latlonPoint geometry', @latlonPoint
go
Nordic Mainframe
  • 28,058
  • 10
  • 66
  • 83
2

Implementation efficency issues aside SQL server uses Quadtree index whereas PostGIS uses R-tree index.

R-tree is for most cases the better algorithm, especially for large datasets with varying geometry size.

Björn Harrtell
  • 211
  • 2
  • 1
2

My gut reaction is "because Microsoft hasn't bothered to make it fast, because it's not an Enterprise Feature". Maybe I'm being cynical.

I'm not sure why you're migrating away from Postgres either.

tc.
  • 33,468
  • 5
  • 78
  • 96
  • I suspect it has more to do with it being a new feature; I heard they're supposed to make it a lot better in the next version. What confuses me is that I haven't heard anything about it being slow, so I'm worried that I'm just missing something. – Brendan Long Aug 12 '10 at 19:39
  • 1
    After working with spatial indexes in SQL Server I'm beginning to this this is true. – Paul Mendoza Aug 04 '11 at 19:06
2

I believe STIntersects is better optimized for using the index would have better performance than STWithin, especially for larger shapes.

Giri
  • 21
  • 1
  • Would it be possbile for you to post the plan information after setting set statistics profile on, for STIntersects query? – Giri Aug 26 '10 at 17:45
1

Have you set up your spatial index correctly? Is your bounding box correct? Are all points inside? In your case probably HHMM for GRIDS would work the best (depending again on a bouding box).

Can you try to use sp_help_spatial_geometry_index, to see what's wrong? http://msdn.microsoft.com/en-us/library/cc627426.aspx

Try using Filter operation instead and tell us what perf numbers you get? (it executes only primary filter (use index) without going through secondary filter (true spatial operation))

Something is wrong with your setup. Spatial is indeed new feature but it's not that bad.

  • I've tried every combination of two sizes (LLLL, LLMM, LLHH, MMLL, etc.) and the best was MMMM with 256 cells per object. `sp_help_spatial_geometry_index` said the primary filter was 90% efficient, which I think might be the problem (others were as low as 70%). `Filter` was much faster than `STIntersects` but still 2-5x slower than Postgres (and not as accurate). – Brendan Long Aug 26 '10 at 17:47
  • We think the problem is that our data is fairly sparse with high-density regions, so the static-grid-size approach isn't helpful. If we set the grids to high, the index is too specific in sparse areas, but if we set it to low, the index is useless in high-density areas. – Brendan Long Aug 26 '10 at 17:51
  • Then try setting up multiple spatial indexes around each high-density region. Or at least break entire US into few big areas. I expect you have most of your data on east and west coast. – Desinderlase Aug 27 '10 at 10:52
  • our data isn't the whole US, it's just Colorado. The problem is that users can select areas that cross the entire state. For example, my test query is a thin box from Fort Collins to Denver. This has two high-density regions separated by a low density region, and I was under the impression that SQL Server will only use one spatial index at a time (and not indexing one of those two areas will be even worse than I have now). – Brendan Long Aug 27 '10 at 15:59
  • I can try loading this database up again on Monday to test it again, for the moment we just removed it because this configuration is way more complicated than just using two databases. – Brendan Long Aug 27 '10 at 16:00
  • one problem with sql server is the relativly course resolution - even a HHHH grid covering the entire world ends up with squares over 600m/side. – Andrew Hill Dec 02 '14 at 07:41
1

You might try breaking it down into two passes:

  1. select candidates into a temp table w/ .Filter().
  2. query candidates w/ .STWithin().

eg:

SELECT * INTO #this FROM PointsTable WHERE Point.Filter(@Shape) = 1
SELECT * FROM #this WHERE Point.STWithin(@Shape) = 1

(replacing SELECT * with only the actual columns you need to reduce I/O)

This kind of micro-optimization shouldn't be necessary, but I have seen decent performance improvements before. Also, you will be able to gauge how selective your index is by the ratio of (1) to (2).

Peter Radocchia
  • 10,710
  • 2
  • 34
  • 56
0

I'm not familiar with spatial queries, but it could be a parameterized query problem

try writing a query (without using parameters) with a fixed value (use a value that performs slow with the parameterized query) and run it. Compare the times with the parameterized version. If its much faster, then your problem is parameterized queries.

If the above is much faster, then I would dynamically build your sql string with the parameter values embedded in the string, that way you can remove parameters from causing problems.

pete
  • 44
  • 2
  • It appears rather improbable that query parametrization is the issue here. The OP said: _"If I select a fairly small shape, I can sometimes get sub-second times, but if my shape is fairly large (which they sometimes are), I can get times over 5 minutes."_ **1.** Query performance seems to be affected by the query geometry's spatial properties. **2.** Even if you think parameterized queries could be slow, they shouldn't turn a fast query into one that takes over 5 minutes (!) to execute... I can't think of any reasonable explanation why they would have that effect. – stakx - no longer contributing Oct 18 '13 at 14:55