0

I have many years of experience as a software engineer, and have worked extensively with databases, mostly Oracle and Postgres. I use what might charitably be called an informal design methodology for database schemas. I sketch out something like an E/R diagram, and I then generate DDL from there. Over time, I modify the schema from there as more requirements arrive. I had rigorous training on the academic side of computer science, and I wrote MSc and PhD dissertations on topics relating to databases. I understand dependencies, normal forms, and the decomposition approach. And I find this approach to schema design to be completely useless in the real world.

I am now teaching a senior-level course on database systems, and I dutifully covered the classical material on schema design, including dependenices, normal forms, decomposition. But I remain unconvinced of the actual value of this approach.

The textbook discussions of these theoretical topics, start with very badly designed schemas, and functional dependencies that come from ... well, I don't know. They're just there, and then they guide you to a better schema. But starting with a good Entity/Relationship model, you probably start with a pretty good schema. And if you understand what your entities are, and what their attributes are -- aren't you basically starting with tables already in BCNF?

For those of you who design and maintain schemas, do you actually use dependency theory and normal forms? Or do you just wing it like I do?

philipxy
  • 14,867
  • 6
  • 39
  • 83
Jack Orenstein
  • 169
  • 2
  • 9
  • I think in terms of functional dependencies when I design tables or queries. When I join tables, I need to know what columns determine/identify the output, to know if I need to do a GROUP BY. That requires understanding FDs and mental algebra using Armstrong's axioms. I know people wing it successfully, but I wouldn't want to work like that again. – reaanb Feb 22 '18 at 22:46
  • 3
    I'm specifically interested in designing tables. How do you obtain the FDs? How do you get your initial table definition that then benefits from decomposition? – Jack Orenstein Feb 23 '18 at 05:05
  • When I'm planning a feature, I often think "to do this, we need an FD from X to Y". E.g. when designing UI screens or reports, if the requirement is to show one Y for each X, it'll need a corresponding FD (which could be transitive). If the FD doesn't exist and isn't viable to add, the screen or report needs to be adjusted to handled multiple Y for each X. As for decomposition, there's an unending supply of legacy tables that were built or adapted in an ad-hoc fashion. I also often use those techniques when fixing inconsistent data. – reaanb Feb 23 '18 at 09:46
  • So in the legacy case: Where do you get the FDs from? If you have a legacy schema that is poorly designed, I am guessing you don't have FDs in any documentation that you inherit. You could get possible FDs from the data, but that has obvious problems. (Even ruling out FDs based on data has problems.) – Jack Orenstein Feb 23 '18 at 17:38
  • Before considering normalizing a table, I familiarize myself with the table's contents and check for any related FK constraints and views. Then I'll search the application source for any references to the table and its fields, looking at queries and surrounding code. If I had any doubts I would also discuss the schema with the other devs and the product owner. I'm a domain expert too, having worked 20+ years in my current industry, and so I have ample resources to help me understand the requirements and clarify or decide what the FDs should be. – reaanb Feb 23 '18 at 18:19
  • 2
    *"And I find this approach to schema design to be completely useless in the real world."* Why? Is it inaccurate? Is it indeterminate? Do you not understand where functional dependencies come from? – Mike Sherrill 'Cat Recall' Feb 26 '18 at 21:35
  • If I understand a model, then I know the entities and attributes, and I know the FDs also. But knowing the entities and attributes, I am able to create a schema which is almost certainly going to have BCNF tables. The academic discussions of dependencies and decomposition start with bad schemas, FDs seemingly from nowhere, and highly unrealistic examples (IMHO). Hence my question. – Jack Orenstein Feb 27 '18 at 04:21
  • 2
    You are likely much better to ask at [dba.se]. (Or [cs.se].) Although I find this XYish--your actual interest seems to be 'is there an "actual value of this approach"?'. Normalization is poorly presented in textbooks & execrably presented outside textbooks & so is virtually always poorly learned or misconceived, or just not learned, so what would "do you use" get you? (I keep meaning to post an answer--to the effect that even the best presentations are a mess but that normalization is crucial & ER modelling inadequate--but justification makes it a bit of an essay.) – philipxy Apr 27 '18 at 09:57

5 Answers5

3

Textbooks and courses often explain normalization with examples of progressive decomposition - starting with schemas where dependencies are not properly enforced by keys and then progressing to better designs that satisfy BCNF, 5NF, etc. This is a teaching exercise used to explain some concepts and techniques; it is not a blueprint for how actual database design ought to be done. It is analogous to practicing long division in maths classes, not because the method is commonly used but because knowledge of fundamental arithmetic is important.

I have used functional dependency analysis to resolve some difficult cases, to validate designs and to do normalization by synthesis. There are a few CASE tools that support normalization by synthesis and perhaps it is a shame that more of the mainstream software tools do not.

nvogel
  • 24,981
  • 1
  • 44
  • 82
3

When I was building databases, I generally started with a good ER model. I needed this to check my understanding of the subject matter. Upon converting the ER model into a Relational model, the result was generally in 3NF, often in BCNF. Usually good enough for OLTP work. For OLAP work, I used star schema design. This was full of update anomalies, which I dealt with by being careful with ETL. Just my take.

Walter Mitty
  • 18,205
  • 2
  • 28
  • 58
3

Usually the combination of a solid understanding of the business (often acquired through thoroughly elaborated conceptual modeling) and not being too unexperienced in design practice, will be sufficient to arrive at 5NF designs "from the get-go" in most use cases. So the "procedure of normalization" as commonly illustrated / suggested by textbooks is almost never actually practised. Applying that procedure is kind of a "bottom up" approach which feels quite unnatural to most practicing designers, who will vastly prefer "top down" starting from conceptual models which will often already be "decomposed" in exactly the way you'd end up applying normalization procedure as a method.

That does not mean normalization theory per se can go out the window. It still constitutes the formal underpinnings of why certain designs are "better" than other alternative designs addressing the same problem.

FD theory is also a crucial piece of material that DBMS designers ought to know. For example, relational DBMS's need to be able to do what is called "key inference" on relational expressions (i.e. compute which keys the output of, say, a JOIN, is guaranteed to comply with, given what the keys of the inputs are. Such inference would not be possible without FD theory.)

As for "if you understand what your entities are, and what their attributes are -- aren't you basically starting with tables already in BCNF?" , that does depend a bit on your conceptual-level entities having been identified "properly" (for some meaning of that latter word - what I'm referring to is that just like people can come up with poor database designs, they can also come up with poor conceptual models and if you use such a poor model to base your db design on, well you can guess what comes out).

Erwin Smout
  • 18,113
  • 4
  • 33
  • 52
0

reannb talks about various artifacts that can provide information on FDs: domain expertise, UI design, FK constraints, source code, queries, talking to other devs. In other words, you either just somehow know FDs (domain experts, other devs), or you are looking at distillations of knowledge from those people (UI, FK, code). But where do the domain experts get the FDs from? If it isn't an E/R model -- either explicitly, or an internal, intuitive one -- then what is the source of the FDs?

Erwin Smout says, basically, GIGO, which is obviously true. But that still doesn't say where the FDs come from if not an E/R model.

So I still don't understand: Where do FDs come from, if not an E/R model? Let me clarify: I am not saying that normalization theory is useless, I agree with Erwin Smout's points on the subject. Also, I am not asking because I am a newbie, (see my original post). My question has to do with the teaching of schema design. The discussions of normalization theory seem highly contrived. They start with a very badly designed schema, and functional dependencies that come from ... well, they never say where the FDs come from. Apply the rules and voila, we have a BCNF schema. It seems to me that a more plausible, realistic, and useful approach is to say:

  • Schema design starts with an E/R model.

  • Here are the procedures for generating a set of table and FK definitions from the E/R model.

  • Now notice that your E/R model actually implies these functional dependencies.

  • Then go into normalization theory and show how BCNF decomposition (for example), yields the same schema from pretty bad starting points.

  • If you inherit a bad schema, developing a clear E/R model and functional dependencies can help you figure out a good schema.

Jack Orenstein
  • 169
  • 2
  • 9
  • You could also ask "where does the E/R model come from". Answer : designers interviewing subject matter experts. And basically that's it. Everything comes from there. Now whether the *results* of those interviews get written down in the form of relation schemas and sets of FDs specifically, is another matter. In practice I've never seen that happen. So you observe correctly that in practice, doing any such thing as "BCNF decomposition" will always lack the input that is formally required to complete the process. ... – Erwin Smout Jun 07 '18 at 07:15
  • 2
    But I don't believe db designers can afford to be ignorant of the various NFs *and their properties in terms of "update anomalies" that will come with any design in any particular xNF*. ... – Erwin Smout Jun 07 '18 at 07:24
  • That said, I also believe it should be taught what&where the shortcomings of FD theory are. For example how it fails to deal with "non-overlap requirements" between ranges of times/dates in temporal databases. For example how SQL's NULL blurs, or even nullifies, many properties a design might be believed to have based on its schemas-plus-FDs. (If you have a schema XY with an FD X->Y, can you then have two tuples (null, "A") and (null, "B") ???) etc. etc. (Note carefully I'm not saying this latter thing to advocate NULL in any way !) – Erwin Smout Jun 07 '18 at 07:34
  • 1
    The business situations that can arise & the table meanings together determine the constraints. The constraints describe invariant properties of the database state & simultaneously, via the meanings, of the business situation. An FD holds when its determinant subtuple has exactly 1 value for its determined subtuple in its table. That simultaneously says that a certain implication holds in the business situation. (Eg Exactly 1 employee manages a department.) We declare the FK because we see that the constraint and/or implication holds. We don't need an ER design for this, just a relational one. – philipxy Jun 07 '18 at 13:55
  • This isn't an answer to the question. It restates/clarifies/extends the question. Part should be incorporated into the question via an edit, not posted as an answer. Also, it mostly asks a new (related) question, which is, where to FDs come from? So that should be integrated carefully (perhaps as an example) if not posted as a separate question. (See also my XY comment on the question.) PS [Re normalization & meanings.](https://stackoverflow.com/a/50516397/3404097) – philipxy Jun 07 '18 at 15:33
  • Erwin Smout: That is an excellent point -- the ER model and the FDs come from the same place. So if we say, in ER terms, that an entity E has properties A, B, C, D; and that E is uniquely identified by (A, B), then we have the FD AB -> CD. Furthermore, decomposition tends to put each key in its own table. So the two approaches really do point in the same direction. Some parts of normalization theory address situations with overlapping entity keys. However, it is not clear to me whether ER design would really produce a schema in which you would need to untangle things using FDs. – Jack Orenstein Jun 07 '18 at 17:38
  • 2
    No, because if all you have is an ER schema specifying keys, then *by definition* the only FDs you can reasonably assume to apply are those that "derive from the keys". And then *by definition* it must be the case that the corresponding logical relation schemas are in BCNF. BTW I have a suggestion for you. Try to find "the FCO-IM book" online. It's about an alternative technique for conceptual modeling, and it has some great detailed stuff about how that interviewing process ought to go. Not all its talk is equally fortunate/commendable, but I think you'll enjoy it. – Erwin Smout Jun 08 '18 at 16:16
0

"do you actually use dependency theory and normal forms? Or do you just wing it like I do?"

I have been designing database schemas for more than fifteen years.

However, I have never used dependency theory (functional dependency analysis- FDA) Nor do I "wing it".
And yet, all of my schema designs are in fifth normal form. (5NF)

My secret is that I use the formal method called "object-role modeling" I use a free tool called NORMA to design a formal model from which the NORMA tool can automatically generate a 5NF logical model.

During the modeling process, I have a "relational view" window open that automatically and instantly generates a 5NF logical view of my object-role model in its "current state".

When I am satisfied with my design, I choose a target RDBMS (e.g. SQL Server) and a few mouse clicks later I have the SQL DDL that I then cut and paste into the "New Query" panel in SQL Server Management Studio. This method is far more effective than FDA or winging it.

You can download the free NORMA tool from here. The tutorials are here.

By the way - I recently heard about a university who have dumped entity-relationship modeling in favour of teaching object-role modeling.

Confession: I did use FDA as a student on a couple of University courses but that was enough to put me off it for life!

Ken Evans
  • 362
  • 1
  • 11