This is a largely simplified version of a function I use in an app that built around 3 years ago. Adapted to the question at hand.
Finds locations in the perimeter of a point using a box. One could do this with a circle to get more accurate results, but this is only meant to be an approximation to begin with.
Ignores the fact that the world is not flat. My application was only meant for a local region, a few 100 kilometers across. And the search perimeter only spans a few kilometers across. Making the world flat is good enough for the purpose. (Todo: A better approximation for the ratio lat/lon depending on the geolocation might help.)
Operates with geocodes like you get from Google maps.
Works with standard PostgreSQL without extension (no PostGis required), tested on PostgreSQL 9.1 and 9.2.
Without index, one would have to calculate the distance for every row in the base table and filter the closest ones. Extremely expensive with big tables.
Edit:
I rechecked and the current implementation allows a GisT index on points (Postgres 9.1 or later). Simplified the code accordingly.
The major trick is to use a functional GiST index of boxes, even though the column is just a point. This makes it possible to use the existing GiST implementation.
With such a (very fast) search, we can get all locations inside a box. The remaining problem: we know the number of rows, but we do not know the size of the box they are in. That's like knowing part of the answer, but not the question.
I use a similar reverse-lookup approach to the one described in more detail in this related answer on dba.SE. (Only, I am not using partial indexes here - might actually work, too).
Iterate through an array of pre-defined search-steps, from very small up to "just big enough to hold at least enough locations". Means we have to run a couple of (very fast) queries to get to the size for the search box.
Then search the base table with this box and calculate actual distance for only the few rows returned from the index. There will usually be some surplus since we found the box holding at least enough locations. By taking the closest ones, we effectively round the corners of the box. You could force this effect by making the box a notch bigger (multiply radius
in the function by sqrt(2) to get completely accurate results, but I wouldn't go all out, since this is approximating to begin with).
This would be even faster and simpler with an SP GiST index, available in the latest version of PostgreSQL. But I don't know if that's possible yet. We'd need an actual implementation for the data type and I didn't have the time to dive into it. If you find a way, promise to report back!
Given this simplified table with some example values (adr
.. address):
CREATE TABLE adr(adr_id int, adr text, geocode point);
INSERT INTO adr (adr_id, adr, geocode) VALUES
(1, 'adr1', '(48.20117,16.294)'),
(2, 'adr2', '(48.19834,16.302)'),
(3, 'adr3', '(48.19755,16.299)'),
(4, 'adr4', '(48.19727,16.303)'),
(5, 'adr5', '(48.19796,16.304)'),
(6, 'adr6', '(48.19791,16.302)'),
(7, 'adr7', '(48.19813,16.304)'),
(8, 'adr8', '(48.19735,16.299)'),
(9, 'adr9', '(48.19746,16.297)');
The index looks like this:
CREATE INDEX adr_geocode_gist_idx ON adr USING gist (geocode);
-> SQLfiddle
You'll have to adjust the home area, the steps and the scaling factor to your needs. As long as you search in boxes of a few kilometers around a point, a flat earth is a good enough approximation.
You need to understand plpgsql well to work with this. I feel I have done quite enough here.
CREATE OR REPLACE FUNCTION f_find_around(_lat double precision, _lon double precision, _limit bigint = 50)
RETURNS TABLE(adr_id int, adr text, distance int) AS
$func$
DECLARE
_homearea CONSTANT box := '(49.05,17.15),(46.35,9.45)'::box; -- box around legal area
-- 100m = 0.0008892 250m, 340m, 450m, 700m,1000m,1500m,2000m,3000m,4500m,7000m
_steps CONSTANT real[] := '{0.0022,0.003,0.004,0.006,0.009,0.013,0.018,0.027,0.040,0.062}'; -- find optimum _steps by experimenting
geo2m CONSTANT integer := 73500; -- ratio geocode(lon) to meter (found by trial & error with google maps)
lat2lon CONSTANT real := 1.53; -- ratio lon/lat (lat is worth more; found by trial & error with google maps in (Vienna)
_radius real; -- final search radius
_area box; -- box to search in
_count bigint := 0; -- count rows
_point point := point($1,$2); -- center of search
_scalepoint point := point($1 * lat2lon, $2); -- lat scaled to adjust
BEGIN
-- Optimize _radius
IF (_point <@ _homearea) THEN
FOREACH _radius IN ARRAY _steps LOOP
SELECT INTO _count count(*) FROM adr a
WHERE a.geocode <@ box(point($1 - _radius, $2 - _radius * lat2lon)
, point($1 + _radius, $2 + _radius * lat2lon));
EXIT WHEN _count >= _limit;
END LOOP;
END IF;
IF _count = 0 THEN -- nothing found or not in legal area
EXIT;
ELSE
IF _radius IS NULL THEN
_radius := _steps[array_upper(_steps,1)]; -- max. _radius
END IF;
_area := box(point($1 - _radius, $2 - _radius * lat2lon)
, point($1 + _radius, $2 + _radius * lat2lon));
END IF;
RETURN QUERY
SELECT a.adr_id
,a.adr
,((point (a.geocode[0] * lat2lon, a.geocode[1]) <-> _scalepoint) * geo2m)::int4 AS distance
FROM adr a
WHERE a.geocode <@ _area
ORDER BY distance, a.adr, a.adr_id
LIMIT _limit;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_find_around (48.2, 16.3, 20);
Returns a list of $3
locations, if there are enough in the defined maximum search area.
Sorted by actual distance.
Further improvements
Build a function like:
CREATE OR REPLACE FUNCTION f_geo2m(double precision, double precision)
RETURNS point AS
$BODY$
SELECT point($1 * 111200, $2 * 111400 * cos(radians($1)));
$BODY$
LANGUAGE sql IMMUTABLE;
COMMENT ON FUNCTION f_geo2m(double precision, double precision)
IS 'Project geocode to approximate metric coordinates.
SELECT f_geo2m(48.20872, 16.37263) --';
The (literally) global constants 111200
and 111400
are optimized for my area (Austria) from the Length of a degree of longitude and The length of a degree of latitude, but basically just work all over the world.
Use it to add a scaled geocode to the base table, ideally a generated column like outlined in this answer:
How do you do date math that ignores the year?
Refer to 3. Black magic version where I walk you through the process.
Then you can simplify the function some more: Scale input values once and remove redundant calculations.