9

Consider a model for matching clients and sevices. Clients may be both providers of and consumers of services at various times. Clients may be individuals or groups (companies), the latter having multiple contacts. Contacts may have multiple addresses, phones, e-mails. Some of these relationships will be one-to-one (e.g., service to provider), but most will be one-to-many or many-to-many (multiple contacts at a company would have the same address).

In this model several associative tables would typically exist, e.g., client_contact, contract_addr, contact_phone, contact_email, service_provider, service_consumer, etc.

Say you issue a simple query for contact information for consumers of a given service. In addition to the six entity tables containing the data, the joins would reference five associative tables. Nothing particularly interesting about about this kind of query, of course - we do it every day.

It occurred to me though: why not have a single "master" associative table holding all associations? It would require this master table to have an "association type" in addition to the two PKs, and for all PKs to be of the same type (ints, GUIDs, etc.).

On the one hand, queries would become more complicated because each join would need to specifiy the type and PK. On the other hand, all joins would access the same table, and with appropriate indexng and caching performance could improve dramatically.

I assumed there might be a pattern (or anti-pattern) describing this approach, but haven't found anything on-line. Has anyone tried it? If so, does it scale?

Any references you can provide would be appreciated.

joelt
  • 2,672
  • 2
  • 24
  • 32
djhill8262
  • 235
  • 3
  • 5
  • Favorited and upvoted, as I've got a gut feel this is a really bad idea, but I can't really pinpoint the exact (technical) reason. One might argue you're very, VERY vulnerable to locking issues with this setup, and you can't really add meta data to your many-to-many relations if needed. Also, I'd assume a proper RDBMS is optimized for dealing with situations you mention in your case. – Erik van Brakel Nov 27 '10 at 03:11
  • That was my thought, which is why I was surprised not to find it documented as a really bad idea, at least where there would be alot of CRUD. I suspect with low TX volumes, and where queries could live with low isolation, it could be viable. I'd assumed that the single "master" table might yield better optimizations, but that might depend on the specific RDBMS. Comparing the plans (with "master" vs. reguar assoc's) would be instructive. – djhill8262 Nov 27 '10 at 05:01
  • I'm thinking type would become the higher order part of the key or indexes, so joins would be something like: on Type = 'Type1' AND PK1 = PK2? Will performance really be better in this case? – Mike K. Nov 29 '10 at 21:33

3 Answers3

1

What you're describing reminds me of fact tables from data warehousing. My understanding is that you start with a typical transactional schema with a table to model every many-to-many relationship. Then, to restructure the data for easier dimensional analysis, you can aggregate some / all the relationships in your schema into one wide table where each column is a key. This effectively performs all possible joins ahead of time and dumps them into a table, inverting the purpose of query joins from relationship following to getting to the properties of your entities.

Anyway, my understanding of this stuff is hazy and my experience effectively nil, but maybe your idea is a fact table by another name, making them useful to investigate.

spieden
  • 1,239
  • 1
  • 10
  • 23
  • Thanks dacc, that gives me a pattern to research, and perhaps may lead to others. A quick search turned up several articles related to star schema (warehousing) describing an "accumulating snapshot" for applications such as mortgage approvals and manufacturing processes. These don't parallel my model but the pattern does have some similarities, and a technique of using views as aliases (such as for clients, contacts, services, etc.) may be useful. I have some down-time over the holidays and may put something together to see how it behaves. Thanks! – djhill8262 Nov 30 '10 at 17:23
0

This can be solved with abstraction and table inheritance.

An Individual Client, Organization Client, Service Provider are all Parties, which play Roles.

An Email Address, Telephone Number, Web Address, and Physical Address are all Addresses.

Neil McGuigan
  • 46,580
  • 12
  • 123
  • 152
0

First off, I think you're definitely paying a price in maintainability. Any time I have a "type" column like that, I think red flag. It seems likely to lead to magic strings in your procedures--you need to make sure type is consistent across inserts and selects, eg. So any performance increase needs to be big enough to justify this headache.

Second, you're paying a price in storing more data--the extra "type" column for each association. And then this data needs to be retrieved when running a query, which affects how many rows can be in memory at once (maybe).

Third, each query probably needs to access the same total number of rows, regardless of whether they're stored in multiple tables or one. So, unless you know something about your data that will let you create clustered indexes or something, you're probably retrieving the same number of pages when you run queries.

Fourth, the likely performance gains come from assuming that the index has a logarithmic behavior, and noting that 5log(N) is greater than log(5N), so it's better to use one big index than 5 smaller ones. However, the addition of the type column is going to reduce this benefit. I'm not really sure how to analyze if it would eliminate it completely, or just reduce it.

Fifth, it seems pretty likely that for at least some queries, you're going to end up joining multiple copies of that huge table, which really seems like it's going to be a killer.

I'd be interested to see what results you get, but I'd be surprised if there's a performance benefit.

joelt
  • 2,672
  • 2
  • 24
  • 32