In my Postgres 9.5 database with PostGis 2.2.0 installed, I have two tables with geometric data (points) and I want to assign points from one table to the points from the other table, but I don't want a buildings.gid
to be assigned twice. As soon as one buildings.gid
is assigned, it should not be assigned to another pvanlagen.buildid
.
Table definitions
buildings
:
CREATE TABLE public.buildings (
gid numeric NOT NULL DEFAULT nextval('buildings_gid_seq'::regclass),
osm_id character varying(11),
name character varying(48),
type character varying(16),
geom geometry(MultiPolygon,4326),
centroid geometry(Point,4326),
gembez character varying(50),
gemname character varying(50),
krsbez character varying(50),
krsname character varying(50),
pv boolean,
gr numeric,
capac numeric,
instdate date,
pvid numeric,
dist numeric,
CONSTRAINT buildings_pkey PRIMARY KEY (gid)
);
CREATE INDEX build_centroid_gix
ON public.buildings
USING gist
(st_transform(centroid, 31467));
CREATE INDEX buildings_geom_idx
ON public.buildings
USING gist
(geom);
pvanlagen
:
CREATE TABLE public.pvanlagen (
gid integer NOT NULL DEFAULT nextval('pv_bis2010_bayern_wgs84_gid_seq'::regclass),
tso character varying(254),
tso_number numeric(10,0),
system_ope character varying(254),
system_key character varying(254),
location character varying(254),
postal_cod numeric(10,0),
street character varying(254),
capacity numeric,
voltage_le character varying(254),
energy_sou character varying(254),
beginning_ date,
end_operat character varying(254),
id numeric(10,0),
kkz numeric(10,0),
geom geometry(Point,4326),
gembez character varying(50),
gemname character varying(50),
krsbez character varying(50),
krsname character varying(50),
buildid numeric,
dist numeric,
trans boolean,
CONSTRAINT pv_bis2010_bayern_wgs84_pkey PRIMARY KEY (gid),
CONSTRAINT pvanlagen_buildid_fkey FOREIGN KEY (buildid)
REFERENCES public.buildings (gid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT pvanlagen_buildid_uni UNIQUE (buildid)
);
CREATE INDEX pv_bis2010_bayern_wgs84_geom_idx
ON public.pvanlagen
USING gist
(geom);
Query
My idea was to add a boolean
column pv
in the buildings
table, which is set when a buildings.gid
was assigned:
UPDATE pvanlagen
SET buildid=buildings.gid, dist='50'
FROM buildings
WHERE buildid IS NULL
AND buildings.pv is NULL
AND pvanlagen.gemname=buildings.gemname
AND ST_Distance(ST_Transform(pvanlagen.geom,31467)
,ST_Transform(buildings.centroid,31467))<50;
UPDATE buildings
SET pv=true
FROM pvanlagen
WHERE buildings.gid=pvanlagen.buildid;
I tested for 50 rows in buildings
but it takes too long to apply for all of them. I have 3.200.000 buildings and 260.000 PV.
The gid
of the closest building shall be assigned. If In case of ties, it should not matter which gid
is assigned. If we need to frame a rule, we can take the building with the lower gid
.
50 meters was meant to work as a limit. I used ST_Distance()
because it returns the minimum distance, which should be within 50 meters. Later I raised it multiple times, until every PV Anlage was assigned.
Buildings and PV are assigned to their respective regions (gemname
). This should make the assignment cheaper, since I know the nearest building must be within the same region (gemname
).
I tried this query after feedback below:
UPDATE pvanlagen p1
SET buildid = buildings.gid
, dist = buildings.dist
FROM (
SELECT DISTINCT ON (b.gid)
p.id, b.gid, b.dist::numeric
FROM (
SELECT id, ST_Transform(geom, 31467)
FROM pvanlagen
WHERE buildid IS NULL -- not assigned yet
) p
, LATERAL (
SELECT b.gid, ST_Distance(ST_Transform(p1.geom, 31467), ST_Transform(b.centroid, 31467)) AS dist
FROM buildings b
LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid
WHERE p1.buildid IS NULL
AND b.gemname = p1.gemname
ORDER BY ST_Transform(p1.geom, 31467) <-> ST_Transform(b.centroid, 31467)
LIMIT 1
) b
ORDER BY b.gid, b.dist, p.id -- tie breaker
) x, buildings
WHERE p1.id = x.id;
But it returns with 0 rows affected in 234 ms execution time
.
Where am I going wrong?