0

We need the ability to map various types of entities to different types of hierarchical data in our web application. As an example, consider an entity called Vendor. We need the ability to map each instance of vendor to geographical areas. Geographical areas are in the following hierarchy:

  1. Post code - most granular geographical area; example, EC2
  2. Locality - formed of post codes; example, Kensington. Each post code will be part of exactly one locality, no more.
  3. Town - formed of localities; example London. Each locality will be part of exactly one town, no more.
  4. District - formed of towns; example, Columbia.
  5. Province - formed of districts (equivalent to a state in some countries); example, South Carolina.
  6. Region - formed of provinces; example, Northeast Provinces.
  7. Country - formed of regions.
  8. Zone - formed of countries; example Southeast Asia.
  9. Continent - formed of zones.

We have a complete database of post codes, localities, towns, districts, provinces, regions, countries, zones and continents. This currently exists as tables in an RDBMS.

Our use cases:

  1. Ability to associate a Vendor with multiple geographies, at any level. For example, we could map Nestle to Mainland Europe (a zone), California (a US province) and Greater London (a district in UK).
  2. Ability to exclude some parts of a geography from the mapping. For example, when mapping Nestle to California, we may want to exclude San Diego.
  3. If the composition of a geography changes, no changes should be required to the mappings of which the geography is a part. For example, if a post code is added to Greater London, the mapping with Nestle should not require a change.
  4. Ability to query the database for a Vendor and a geography level. For example, if we query the database for Nestle and post codes, we should get all the post codes for Greater London, California (minus post codes for San Diego) and Mainland Europe. If we query the database for Nestle and countries, we should get UK (country for Greater London), US and all countries in Mainland Europe.

There are many such mappings with many different types of hierarchical data, this is just one of the requirements.

Looking for suggestions on storing the hierarchical data and the mappings. Please do not post answers that involve storing the data and mappings in RDBMS tables, as I am already aware of the options using RDBMS.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
manish
  • 19,695
  • 5
  • 67
  • 91
  • Should this question be tagged with database-design if you're not looking for a database solution? – Martin Wilson May 09 '12 at 07:56
  • I am not looking for an RDBMS based solution. There can be other database solutions involving NoSQL databases like Neo4j, Redis, etc. or hierarchical databases. – manish May 09 '12 at 08:26
  • Is there a _specific_ reason why you want to avoid a relational database? Perhaps you think this cannot be represented in a relational database? – Branko Dimitrijevic May 09 '12 at 11:15
  • As I have mentioned, we already have the geography data in a relational database. However, the performance of at least some of the use cases is quite poor. Do go through the 4th use case in detail to understand why the performance suffers when modeling this with a regular RDBMS. – manish May 09 '12 at 11:30
  • @manish.in.java - If you use the model that I suggested in my answer your 4th use case will not perform badly. As long as the `GEOGRAPHY` table has an attribute defining the layer/level of the item and there is an index on layer and left/right visitation numbers then retrieval will be very fast. – Joel Brown May 10 '12 at 16:33

1 Answers1

2

I know that you are not looking for a RDBMS solution, since you "know the options for RDBMS" - but you don't state what options you have considered and why you are rejecting them. I believe there may be an RDBMS option you haven't considered which has the advantage of low data maintenance and ease of data retrieval.

I suggest that you group your geographical regions into a single table with an involuted relationship (adjacency model). This allows you to describe your vendor coverage as a list of geographical coverage records. The trick is to record vendor coverage at the highest possible level. You can then also list exclusions to the coverage.

Here is a suggested logical model:

Logical ERD

The GEOGRAPHY table could use nested sets to simplify the querying of the hierarchical geography data.

Determining vendor coverage of a particular area would be as easy as confirming that a vendor has COVERAGE for the GEOGRAPHY of interest (probably at the lowest level) while not also being in a COVERAGE EXCLUSION for that same GEOGRAPHY.

Community
  • 1
  • 1
Joel Brown
  • 14,123
  • 4
  • 52
  • 64