-1

This is a conceptual question, but I should have asked it long ago on this forum.

I have a PostGIS database, and I have many tables in it. I have researched some on the use of keys in databases, but I'm not sure how to incorporate keys in the case of the point data that is dynamic and increases with time.

I'm storing point data in one table, and this data grows each day. It's about 10 million rows right now and will probably grow about 10 million rows each year or so. There are lat, lon, time, and the_geom columns.

I have several other tables, each representing different polygon groups (converted shapefiles to tables with shp2pgsql), like counties, states, etc.

I'm writing queries that relate the point data to the spatial tables to see if points are inside of the polygons, resulting in things like "55 points in X polygon in the past 24 hours", etc.

The problem is, I don't have a key that relates the point table to the other tables. I think this is probably inhibiting query efficiency, but I'm not sure.

I know this question is fairly vague, and I'm happy to clarify anything, but I basically have a bunch of points in a table that I'm spatially comparing to other tables, and I'm trying to find the best way to design things.

Thanks for any help!

user1610717
  • 471
  • 5
  • 16

2 Answers2

1

If you don't have already, you should build a spatial index on both the point and polygons table.

Anyway, spatial comparisons are usually slower than numerical comparison.

So adding one or more keys to the point table referencing the other tables, and using them on your select queries instead of spatial operations, will surely speed up. Obviously, inserts will be slower, but, given the numbers you gave (10millions per year), it should not be a problem.

Probably, just adding a foreign key to the smallest entities (cities for example) and joining the others to get results (countries, states...) will be faster than spatial comparison.

fradal83
  • 2,002
  • 1
  • 10
  • 9
  • Thanks! I think I'm tracking. So when I insert point data (which I'm basically copying from a csv file every 10 minutes), I can run an update statement after the copy that runs a spatial query against cities or counties on the rows from the past 10 minutes and updates to fill a field like "city" or "county" as the foreign key? On a little side note but certainly related, one query I've found to be painfully slow (and never successful thus far) is a ST_DWithin on 24 hours of point data within 5000m of a polygon. – user1610717 Jun 24 '16 at 04:07
0

Foreign keys (and other constraints) are not needed to query. Moreover they arise as a consequence of whatever design arises appropriate to the application per priciples of good design.

They just tell the DBMS that a list of values under a list of columns in a table also appear elsewhere as a list of values under a list of columns in some table. (For avoiding errors and improving optimization.)

You still would want indices on columns that will be involved in joins. Eg you might want X coordinates in two tables to hav sorted indices, in the same order. This is independent of whether one column's values form a subset of another's, ie whether a foreign key constraint holds between them.

Community
  • 1
  • 1
philipxy
  • 14,867
  • 6
  • 39
  • 83