7

I want to build a database of geographical locations and would like to be able to identify locations that fall inside other locations. For example, The Empire State Building is going to have one geo-coordinate, but my database would be able to tell me that it falls inside Manhattan, which falls inside New York City, which is in the state of New York and so forth.

I've been looking at OpenStreetMap which seems to have a pretty decent database but as best I can tell, I would need to create a set of polygon structures representing each region and then detect if a coordinate falls inside a given region's polygon. Is there a better way to do this, or is there a data source where all of this has already been calculated?

Nathan Ridley
  • 33,766
  • 35
  • 123
  • 197
  • if such a database existed, it would be tightly held due to its complexity, or would be expensive. If you dont do it based on polygons, how else would you expect it to work? – Randy Jul 25 '11 at 18:13
  • 1
    Not being an expert in such things, I don't presume that I have thought of everything, hence the question! – Nathan Ridley Jul 25 '11 at 18:16
  • You either have to use own hierarchy of polygons or use existent reverse geocoding solutions(which anyway uses polygons internally). – Peter Popov Jul 25 '11 at 21:45

3 Answers3

3

Try the Yahoo! GeoPlanet Data at http://developer.yahoo.com/geo/geoplanet/data/

It is already organised into a hierarchy structure, countries, admin divisions and places.

You can also extend the data by using the 'Geo' methods of the YQL API at http://developer.yahoo.com/yql/console/

Chaoley
  • 1,282
  • 15
  • 21
  • Interesting - my query volume would outstrip their rate limits by a long shot unfortunately though. I need a home grown solution. – Nathan Ridley Jul 26 '11 at 17:55
  • Nathan- You know you can download the data for use offline, right? Read the first link in @Chaoley's response. – RyanKDalton Nov 30 '11 at 17:38
  • You might also reference this question for ideas: http://gis.stackexchange.com/questions/12945/open-datasets-with-centroids-or-other-geometry-for-woeids – RyanKDalton Nov 30 '11 at 18:11
1

You also may want to look into the Geonames database. While it is not classified using hierarchical method, you could probably derive the information out of it.

If you really want to dive into building a geographical database where you can analyze the data, take a look at loading your data into the free/open-source PostgreSQL/PostGIS stack. With that you can actually write SQL that answers questions like "show me all points [within a city/county/state boundary]" or "[within X distance from Y location]".

Good places to learn more about PostGIS is at the BostonGIS website, the GIS.StackExchange pages, or of course the manual but who reads those anymore...

Community
  • 1
  • 1
RyanKDalton
  • 1,271
  • 3
  • 14
  • 30
  • Ah great link to Geonames DB and GIS.StackExchange, thanks. Also, I'll probably be using MongoDB which has geo functionality as well. – Nathan Ridley Dec 01 '11 at 15:47
  • You might also want to reference these articles too, then: http://stackoverflow.com/questions/7903712/spatial-data-with-mongodb-or-cassandra and http://ralphbarbagallo.com/2011/04/02/an-overview-of-geospatial-databases/ – RyanKDalton Dec 01 '11 at 16:21
0

I'm pretty sure the google maps API has regions defined as polygons. And by regions I means, State, City, Zip Code, or just about anything that could be defined as a "region"

You would have to hit-test (Google Maps might have a function for this already) a point to see if it is inside a polygon.

You could also use the lookup address by GeoLocation functions to find which region(s) a point resides in, and just use that.

Neil N
  • 24,862
  • 16
  • 85
  • 145
  • Thanks for the link - my querying would hit their rate limits too quickly though. Need to host and process the data myself I think. – Nathan Ridley Jul 26 '11 at 17:56
  • How often would yuo get new addresses? It's not like the city a location resides in would change often. Spread load your queries over time and eventually you will only need to hit them occasionlly. That is assuming you store the results on your end. – Neil N Jul 26 '11 at 18:12
  • I'm not necessarily looking for addresses. I have a large number of topics on arbitrary things and each has a geographical coordinate. It could be in the middle of Manhattan or it could be in the middle of the Amazon Jungle. Also I can't spread my querying out because there are too many queries to do. It's for my business which provides a tonne of preanalysed data for customers. – Nathan Ridley Jul 26 '11 at 19:57
  • @Nathan, what I mean is how often will you be querying locations(address or not) that you haven't queried before? If not very often, why not cache the results in your own DB? – Neil N Jul 26 '11 at 20:10
  • Probably very commonly. And hundreds of thousands of queries per day. – Nathan Ridley Jul 27 '11 at 12:49