Dynamic contact information data/design pattern: Is this in any way feasible?

Question

I'm currently working on a web business application that has many entities (people,organizations) with lots of contact information ie. multiple postal addresses, email addresses, phone numbers etc.

At the moment the database schema is such that persons table has postal address columns, phone number columns as does organizations table. This is not a good way to handle this.

I've read the c2 Wiki on this and there's some good discussion regarding Contact and address models (http://c2.com/cgi-bin/wiki?ContactAndAddressModels) and wheter or not physical addresses are archaic (http://c2.com/cgi-bin/wiki?ArePhysicalPostalAddressesArchaic). These two discussions really opened my eyes on the scope of this problem.

I'm thinking about separating contact information fields to separate table(s). But what's the best way to do this. At the moment the application mainly handles Finnish addresses but it's on the horizon that it needs also to handle international addresses.

I could define an "addresses" -table, a "phone numbers" -table, an "email addresses" -table and so on and these would be linked to people and organizations. But this just feels too much like the previous solution: it's inevitable that the predefined database schema isn't sufficient.

What I'm proposing is to create a contact information schema/program logic that is dynamic:

There are no predefined contact information fields/field sets
Users can define new contact information types and required fields at any time like
- Finnish postal address
- Swedish postal address
- ... postal address
- Phone number
- Email address
- ICQ-number

Is this feasible? Has anyone done anything like this?

There could be a table that defines contact information types:

contact information types

Id: Identifier
Name: "Finnish postal address"
Description: "Use this contact information type for finnish postal addresses"

Then there could be a table that defines what fields are used per contact information type:

contact information type fields

Id: Identifier
Contact_information_type_id: References the previous table
Field title: "Address line 1"
Field description: "Use this line for postal addresses' first line"
Field type: String/Integer/etc.
Field format: Regular expression for validating field data
Field order: In which order should this field appear when displaying/using this contact information type

Then we'd have a "contact information table" that just is used to map contact information fields together:

contact information

Id: Identifier
Contact_information_type_id: References the contact information type table

Then we'd have a "contact information of person" -table mapping different contact information to persons:

contact information of person

Id: Identifier
Contact_information_id: References the contact information table
Person id: References the person

Then we'd need tables per contact information field type like:

contact information integer fields

Id: Identifier

Contact_information_id: References the contact information table

Value: The value of this field

and so on for strings etc...

Finally when displaying different contact information of a given person this would happen through person's contact information -table whis looks up what fields are used to form this contact information from contact information type fields -table through contact information -table. After determining what fields are used all the necessary tables would be joined together.

I'm having doubts about the feasibility of in SQL. Any thoughts?

In Java I probably could program some logic to determine what tables are neede to form a contact information entity and then i could use some sort of dynamic beans to represent this data in Java. But that's a bit foggy to me too. Anyt thoughts on this too?

Tom Leys · Answer 1 · 2008-09-19T04:09:41.580

It is starting to sound like you have a perfectly good hammer (i.e your SQL database) and you are trying to make another hammer with it (a meta-language to define SQL schemas).

Before you go down this path, there are many products on the market that aim to store customer details in an SQL database. It might be best to just purchase one off the shelf and integrate with it. Then all the concerns you have are addressed by someone else and you can focus on your specific business case.

Edit: One example of a package that allows you to add custom contact fields is SugarCRM - it is a commercial product where you buy access to the source on purchase. I'm sure there are many more but this is the only one that comes to mind at present.

Could you give any examples of such products? – kosoant Sep 18 '08 at 08:22 — kosoant, Sep 18 '08 at 08:22

score 1 · Answer 2 · answered Sep 17 '08 at 07:56

Your design is feasible, and I'm as big a fan of normalization as the next guy, but you really have to find a balance somewhere. So to begin, I think you're right that having fields like address1, address2, address3, etc... is bad practice. And if you are planning on handling many different types of mailing addresses from different countries, it might make sense to abstract out various address types.

Think about the data you're going to want to get out of the system - for example, will someone be asking for all the customers in a certain state or province? In that case your design will be pretty painful.

Another thing to keep in mind is that database schema changes, though they can sometimes be painful, are not the worst thing in the world. Follow that path to it's logical extreme and you'll end up with one gigantic table with fields like "key" and "value" and thousands of self-joins in every query.

Good luck finding the right balance!

score 0 · Answer 3 · answered Sep 17 '08 at 07:50

0

This is not a very informative post; have you had a look at how the vCard people handle the same issues? Also, be careful of overengineering, you might end up with N3.

answered Sep 17 '08 at 07:50

I did indeed look at vCards, but there was nothing to really help me with this database design issue. – kosoant Sep 18 '08 at 08:25

score 0 · Answer 4 · answered Sep 17 '08 at 08:57

First: Speaking pragmatically, it depends what you want to do with the data. In my experience, 99% of all address data is only ever used as a string to be printed on a letter. If that is the case for you, then you should stop worrying and just store it as a string. Of course, if you're doing deeper work with it then it's not going to be so easy.

Apart from that... I like the way you're thinking. I have done similar things (although not with addresses) to handle dynamic schemas. The problem I run into is (as you've identified) that the SQL to extract the stuff gets complex. Another problem is that this flexibility can lead to spaghetti data, in exactly the same way you can get spaghetti code. I.e. the meaning of what's in your tables can become obscured because you can only understand it by looking at the code which accesses it.

So, what you have to decide is where you are prepared to accept complexity, and what kind of complexity you can best handle. If you don't mind complex SQL, then go ahead and build your dynamic schema. If you do mind complex SQL, then either build the static tables (with one table per address type), or accept that you won't have such an elegant data structure.

So, short answer: you have to call it.

It's true that our uses addresses mainly just to print them on envelopes. But the problem with plain text fields without any kind of validation is that the data users input is rarely any good if they're not forced to input correctly formed data. That's a bit harsh on the user, but often necessary — kosoant, Sep 18 '08 at 08:28
Good point. You can always impose validation on the user at input time, but you don't have to store the data in any kind of structured way. I've spent a long time thinking about this very question, and I'm not really sure what the best way is, either! — AJ., Sep 22 '08 at 10:48