I know this question has been asked in various forms over and over again but I cannot find a single complete answer and I believe it is a general problem in the RDBMS/data area and industry. To explain the problem I will tell you a short (and maybe boring) story!
The Story
"I have a friend" who works in A
company that uses 100+ systems. The scale and size of these systems vary from full-blown ITIL to custom/in-house, single purpose, LAMP/SQLite/CSV-based solutions. The vast majority of these systems, at one point or another use a database/data-store... Big-data has now become a trend, and A
company though it is a very good idea to keep (or log) historical data from all systems forever! For that reason they have built a "warehouse". My friend is responsible for writing software that will do the analysis on some of this data ... however, he is kinda confused. There are thousands of tables in that warehouse containing data from the beginning of time (1970s I think :)).
The Problem
[Since I started telling you about this guy, I should probably continue]
My friend is very upset because of the lack of documentation in that warehouse. It seems that no-one knows what is what?! Few of the problems he faces (and I quote):
Man, some fields are constants... they have a special meaning to the application but I have no way of knowing? But that is OK... cause some other fields are bit-masks! Different bit values in the field have different meaning!
and he continues...
That's not all... these are the easy cases you know... Since we have data from multiple systems, we end up with a situation where different systems refer to the same thing in a different way... how can I explain it to you... for example, a network device has an FQDN, however some systems treat it as the primary key, some others don't and instead they allocate an auto-increment integer value, which in turn they use for foreign keys (you know... referencing this device).
and he can go on forever!
The Question
[Yes, it is one question]
He says:
We have come a long way regarding documentation in the software world... we have started with documents, moved to wikis and concluded to inline docblocks serving both as parameter/signature documentation and as wiki! We can auto-generate documentation, clear enough that a person in the other hemisphere and side of this world can easily follow!
and he continues:
... in the data side of things, we also had major achievements! Storage methods, serialization, transmission and data analysis techniques have evolved tremendously... We have also managed to map database tables into objects and in some cases we can even represent relationships!
So why the frell don't we have a standard method/technique of documenting our data structures in an RDBMS?
... he concluded :)
Enough with my friend, so my comments:
I know about comments on fields in various systems, but that is usually enough for a "deprecated" and not for an explanation
Updating a wiki or even worse a document, every time you release a database patch is not a solution... that patch should contain the relevant documentation!
ER diagrams can be easily generated based on the schema information, however this is not the easiest form of documentation to read... for anything more than 10 tables!
There is the saying (please comment if you know who said this! - respect)
Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing
Why SQL doesn't provide the means to any?