0

I know this question has been asked in various forms over and over again but I cannot find a single complete answer and I believe it is a general problem in the RDBMS/data area and industry. To explain the problem I will tell you a short (and maybe boring) story!

The Story

"I have a friend" who works in A company that uses 100+ systems. The scale and size of these systems vary from full-blown ITIL to custom/in-house, single purpose, LAMP/SQLite/CSV-based solutions. The vast majority of these systems, at one point or another use a database/data-store... Big-data has now become a trend, and A company though it is a very good idea to keep (or log) historical data from all systems forever! For that reason they have built a "warehouse". My friend is responsible for writing software that will do the analysis on some of this data ... however, he is kinda confused. There are thousands of tables in that warehouse containing data from the beginning of time (1970s I think :)).

The Problem

[Since I started telling you about this guy, I should probably continue]

My friend is very upset because of the lack of documentation in that warehouse. It seems that no-one knows what is what?! Few of the problems he faces (and I quote):

Man, some fields are constants... they have a special meaning to the application but I have no way of knowing? But that is OK... cause some other fields are bit-masks! Different bit values in the field have different meaning!

and he continues...

That's not all... these are the easy cases you know... Since we have data from multiple systems, we end up with a situation where different systems refer to the same thing in a different way... how can I explain it to you... for example, a network device has an FQDN, however some systems treat it as the primary key, some others don't and instead they allocate an auto-increment integer value, which in turn they use for foreign keys (you know... referencing this device).

and he can go on forever!

The Question

[Yes, it is one question]

He says:

We have come a long way regarding documentation in the software world... we have started with documents, moved to wikis and concluded to inline docblocks serving both as parameter/signature documentation and as wiki! We can auto-generate documentation, clear enough that a person in the other hemisphere and side of this world can easily follow!

and he continues:

... in the data side of things, we also had major achievements! Storage methods, serialization, transmission and data analysis techniques have evolved tremendously... We have also managed to map database tables into objects and in some cases we can even represent relationships!

So why the frell don't we have a standard method/technique of documenting our data structures in an RDBMS?

... he concluded :)

Enough with my friend, so my comments:

  • I know about comments on fields in various systems, but that is usually enough for a "deprecated" and not for an explanation

  • Updating a wiki or even worse a document, every time you release a database patch is not a solution... that patch should contain the relevant documentation!

  • ER diagrams can be easily generated based on the schema information, however this is not the easiest form of documentation to read... for anything more than 10 tables!

  • There is the saying (please comment if you know who said this! - respect)

    Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing

    Why SQL doesn't provide the means to any?

Community
  • 1
  • 1
urban
  • 5,392
  • 3
  • 19
  • 45

1 Answers1

2

Any kind of documentation would get stalled if not maintained.

Also, the SQL world provides all kind possibilities to document things:

  • comments in SQL files
  • comments in columns/tables metadata
  • as you said - E/R diagrams
  • the classic way of documenting stuff - docs and wikis
  • good discipline in adhering to an intuitive naming scheme for the things in DB - I think this should be the standard

We have all the tools we need, we just have to convince our managers to let us write the docs (lol)

Tudor Constantin
  • 26,330
  • 7
  • 49
  • 72
  • True for the first sentence! True for managers too :) but wouldn't you expect a more automated way of documenting SQL schemaz, as opposed to doc/wiki? Something along the lines (given the SQL capability to do so!): `mysql-doc ` and voila:) : field values, ER diagrams, etc. In a similar way you do `javadoc`, `doxygen`, `sphinx`, etc – urban Apr 22 '16 at 20:12
  • btw, naming scheme... probably the most important (same in programming... consistency :)) which is a good point (+1)! and the reason python has PEP and PHP has PSR... what does SQL have? :) – urban Apr 22 '16 at 20:14
  • you can generate E/R diagrams from mysql databases using MySQL workbench. In order to be able to generate some meaningful documentation, you need to **have that info** and you need to have it up to date - no tool will keep this info up to date. – Tudor Constantin Apr 22 '16 at 20:16
  • Exactly! In the software world, where my friend comes from (:)) there is a convention (doc-blocks) that allows you to store this info "close to" your code and not in any external system, wiki, doc, etc. MySQL workbench is an awesome tool which I very often use. However, my question is, given the increasing complexity of data and data structures, should SQL itself (comments, meta-data or otherwise) provide "inline" documentation space? We are using standards from '92/98 :) [just googled that.. there are more recent ones... not sure if they address the problem] – urban Apr 22 '16 at 20:24