4

In the database world, we have normalisation. You can start with a design, crank the steps and end up with a normal form of the database. This is done on the basis of the semantics of the data and can be thought of as a series of design refactorings.

In object orientated design, we have the SOLID principals and various other adhoc guidelines towards good design.

Do you think it is possible to define the equivalent of normal forms for OO, such that a series of refactoring steps could move a procedural piece of code (or poorly factored OO design) into a correct (in some well-defined sense) formulation with the same functionality?

(NB. Happy to make this community wiki)

Matt Enright
  • 7,245
  • 4
  • 33
  • 32
WW.
  • 23,793
  • 13
  • 94
  • 121

3 Answers3

8

It is a possibility, but highly unlikely.

Context

First, in the days when the Relational Model came out, people who worked in IT were more educated and they esteemed standards. Computer resources were expensive, and people were always looking for the best way to use those resources. People like Codd and Date were giants in an industry where people were high tech.

Codd did not invent Normalisation, we were Normalising our non-relational databases long before Relational came along. Normalisation is a theory and practice, published as the Principle of Full Normalisation. We were Normalising our programs, we considered accidental duplication of a subrotine (method) a serious error. Nowadays it is known as Never Duplicate Anything or Don't Repeat Yourself, but the recent versions do not acknowledge the sound academic theory behind, and are therefore its power is unreallised.

What Codd did (among many things) was define formal Normal Forms specifically for Relational Databases. And these have progressed and been refined since then. But they have also been hijacked by non-academics for the purpose of selling their gear.

The database modelling that was invented by Codd and Chen, and finished by Brown had a solid grounding. In the last 25 years, its has achieved Standardisation and been further refined and progressed by many others who had solid grounding.

The World Before OO

Let's take the programming world before OO. We had many standards and conventions, for modelling our programs, as well as for language- and platform-specific implementation. Your question simply would not apply in those days. The entire industry understood deeply that database design and program design were two different sciences, and used different modelling methodologies for them, plus whatever standards applied. People did not discuss if they implemented standards, they discussed the extent to which they complied with standards; they did not discuss if they modelled their data and programs, they discussed the extent to which they modelled their data and programs. That is how we put men on the Moon, notably in 1969.

Dawn of OO

OO came along and presented itself as if no other programming language or design methodology existed before it. Instead of using existing methodologies and extending or changing them, it denied their existence. So, not surprisingly, it has taken 20 years to formulate the new methodologies from scratch and slowly progress them to the point of SOLID and Agile, which is not mature; the reason for your question. It is telling that more than twenty such methodologies have flashed up and died during that time.

Even UML, which could have been an outright winner, applicable to any programming language suffered the same disease. It tried to be everything to everyone while denying that mature methodologies existed.

Demise of the Industry

With the advent of MS, the attitude of "anyone can do anything" (implication: you do not need formal education or qualifications), that quality and pride of profession has been lost. People now invent things from scratch as if no one on the planet has ever done it before. The IT industry today is very low tech. You kow, but most people reading these pages do not know, that there is one Relational Modelling methodology, and one Standard. They do not model, the implement. Then re-implement. And re-implement. Re-factoring as you say.

OO Proponents

The problem was that the people who came up with these OO methods were not giants among professionals; they were simply the most vocal of an un-academic lot. Famous due to publishing books, not due to peer acknowledgement. Unskilled and unaware. They had One Hammer in their toolkit, and every problem looked like a nail. Since they were not formally educated they did not know that actually database design and program design are two different sciences; that database design was quite mature, had strongly established methodologies and standards, and they simply applied their shiny new hammer to every problem, including databases.

Therefore, since they were ignoring both programming methodologies and database methodologies, inventing the wheel from scratch, those new methodologies have progressed very slowly. And with assistance from a similar crowd, without sound academic basis.

Programs today have hundreds of methods that are not used. We now have programs to detect that. Whereas with the mature methodologies, we prevent that. Thin client was not a goal to be achieved, we had a science that produced it. We now have programs to detect "dirty" data and to "clean" it. Whereas in the upper end of the database market, we simply do not allow "dirty" data into the database in the first place.

I accept that you see database design as a series of re-factorings, I understand what you mean. To me it is a science (methodology, standards) that eliminates ever having to re-factor. Even the acceptance of re-factoring is loud signal that the older programming methodologies are unknown; that the current OO methodologies are immature. The danger, what makes it annoying to work with OO people, is that the methodology itself fosters a confidence in the One Hammer mentality, and when the code breaks, they have not one leg to stand on; when the system breaks, the whole system breaks, it is not one small piece that can be repaired or replaced.

Take Scott Ambler and Agile. Ambler spend 20 years publicly and vociferously arguing with the giants of the database industry, against Normalisation. Now he has Agile, which although immature, has promise. But the secret behind it it Normalisation. He has switched tracks. And because of his past wars, he cannot come out and declare that honestly, and give others due credit, so it remains a secret, and you are left to figure out Agile without its fundaments being declared.

Prognosis

That is why I say, given the evidenced small progress in the OO world over the last 20 years; the 20 or so OO methodologies that have failed; the shallowness of the approach, it is highly unlikely that the current OO methodologies will achieve the maturity and acceptance of the (singular) database design methodology. It will take at least another 10 years, more likely 20, and it will be over taken by some replacement for OO.

For it to be a possibility two things need to happen:

  1. The OO proponents need formal tertiary education. A good grounding in the science of programming. Sure, anyone can do anything, but to do great things, we need a great grounding. That will lead to the understanding that re-factoring is not necessary, that it can be eliminated by science.

  2. They need to break their denial of other programming methodologies and standards. That will open the door to either building OO on top of that, or taking the fundaments of that and merging it into OO. That will lead to a solid and complete OO methodology.

Real World OO

Obviously I speak from experience. On our large projects we use the mature analysis and design methodologies, one for database and another for function. When we get to the code-cutting stage, we let the OO team use whatever they like, for their objects only, which usually means UML. No problems with architecture or structure or performance or bloatware or One Hammer or hundreds of unused objects, because all that was taken care of outside OO. And later, during UAT, no problems with finding the source of bugs or making the required changes quickly, because the entire structure, has documented structure; the blocks can be changed.

WW.
  • 23,793
  • 13
  • 94
  • 121
PerformanceDBA
  • 32,198
  • 10
  • 64
  • 90
  • @PerformanceDBA: Nicely put, and thanks for the very solid info. I am an example of the idea that "anyone can do anything", as I have no formal CS education whatsoever, so I am glad for myself the world is the way it is today. However, I also totally understand your reasoning, and can honestly say I WISH I had more fromal education. Your perspective is definitely informative, and provides a target for those of us learning the craft! – XIVSolutions Jan 28 '11 at 14:17
  • @PerformanceDBA: Also, duly noted about the manner in which the formaerly high modelling and development standards of the past have slipped. In one sense, it IS nice that the evolution of the technology has broadened performance tolerances to the point where folks such as myself are able to pick up programming as an "Enthusiast" without the years of formal CS education. On the other hand, back in the day one simply HAD to know what was happening, and adhere to highly formal methodologies, in oredr to create working programs within the memory and processing limitations of machines at the time. – XIVSolutions Jan 28 '11 at 14:23
  • Last - Totally agree about UML. If it had not become bloted in an attempt to do everything for everyone, it weas on the right track. Even some of the original proponenets agree (I seem to recall it was Grady Booch who has made this observation, but one of them did . . .). – XIVSolutions Jan 28 '11 at 14:27
  • 1
    Thankyou for your well reasoned answer. Regarding "I accept that you see database design as a series of re-factorings". What I meant was that there are a defined series of steps to normalise a design. An experienced designer would generally skip directly to the normalised form. I asked this question originally because I think OO is still looking for it's Codd. – WW. Jan 29 '11 at 00:54
  • 1
    @WW. You're welcome. Beautifully stated. You're right, with practice, we pretty much go straight to 5NF; but that is after years of working the Steps. – PerformanceDBA Jan 30 '11 at 12:52
  • @XIVSolutions. My pleasure. If you're interested in Data modelling, I can give you links. – PerformanceDBA Jan 30 '11 at 12:53
  • I will hungrily devour any pre-parsed, already-determined-to-be-worthwhile-bysomeone-in-the-know links . . . – XIVSolutions Jan 30 '11 at 16:08
  • 2
    @XIVSolutions. I didn't mean that kind of link (the web is full of rubbish posted by amateurs). I meant this kind, for someone eager: [Answer 1](http://stackoverflow.com/questions/4061826/is-my-data-normalized/4062053#4062053) [Answer 2](http://stackoverflow.com/questions/4771528/data-modeling-logical-modeling-exercise/4772849#4772849). [Answer 3](http://stackoverflow.com/questions/4491173/historical-auditable-database/4573129#4573129). [Answer 4](http://stackoverflow.com/questions/4132044/name-database-design-notation-you-prefer-and-why/4140309#4140309) ... there's more ... – PerformanceDBA Feb 03 '11 at 04:04
  • @PerformanceDBA - Yup. Those kind are even better. Thanks! And yes, even I have found that the web is full of rubbish . . . – XIVSolutions Apr 15 '11 at 13:54
2

I think this is an interesting question, because it presumes that Codd's Normal Forms are actually the definition of "correct" design. Not trying to start a flame war with that statement, but I guess my point is that there are very good reasons that many DB's aren't fully normalized (e.g. join performance) leads me to think that the real-world equivalent of normalization in OO space is probably design patterns or (as you said) SOLID. In both cases you're talking about idealized guidelines that have to be applied with a suitably critical eye, rather than slavishly followed as dogma.

Paul
  • 35,689
  • 11
  • 93
  • 122
  • Normal forms provide a clear definition of "correctness". Choosing to break them for various reasons is then an option. I don't think with OO design we are starting from a position of well-defined correctness and then moving away for practicality, I think we're making it up as we go along. – WW. Jan 29 '11 at 00:49
  • 1
    Er Codd's NFs **are** the definition; IDEF1X **is** the methodology. Your posts is classic of people you are oblivious to both; justifying non-design; creating non-databases. The low end of the market does not "define" rules. – PerformanceDBA Feb 03 '11 at 04:54
  • Understanding when a rule applies and when it doesn't is good design. Slavishly following principles set down decades ago when the environment has changed is not. The DB is but one part of the architecture, and optimizing it at all costs can impact overall performance. – Paul Feb 03 '11 at 18:25
2

Not only do I fully agree with Paul, but I will go a step further.

Models are just that - only models. The Normalization models used by Relational Databases are only one approach to storing and managing data. In fact, note that while RDBMS's are common for Data Manipulation operations (the standard CRUD), we have now evolved the DataWarehouse for consolidation, analysis, and reporting. And it most definitely does NOT adhere to the normalization models found in DML land.

Now we also have Google with their BigTable architecture, and Apache with Hadoop. These newer modeling systems reflect a change in the landscape, driven by the idea of the DISTRIBUTED database. Normalization need not apply for this club either.

We can apply a successful model ony to the point at which it becomes not-so-successful, or is supplanted by an model which better suits the needs of the designer. Note the many ways we humans have modelled our universe through physics/Astronomy what have you. Modelling attmpts to describe a system in discreet terms, but as the system, or the needs of the system change, so must the model.

OOP is and has been a very, very successfulk way to model computer applications. However, the needs of the application designer are different from thos eof Database designers. MOST of the time, there is a point at which the designer of an application must consider that his program will be interacted with by humans. Unlike the database designer, whose work will (mostly) be expected to interact with other code, the programmer's job is to take the machine and make it accessible to a much more random human-being. This art does not map quite so well to such standards like normalization.

All that said, n-tier, MVC, MVVC, and other paradims DO establish some guidelines. But in the end, the problem-space of application design is usually not as easy to fit into such discrete modelling steps as a relational databse.

Wow. Apologies for the length. If this is a breach of ettiquette here, do let me know . . .

cethegeek
  • 6,286
  • 35
  • 42
XIVSolutions
  • 4,442
  • 19
  • 24
  • 5
    Actually, the better (faster, less maintenance) DWs follow DB rules, Normalisation; not separate DW rules. You might not be aware, for 30 years, we have had an unending array of OO databases that were going to replace RDb, but they did not last one year. – PerformanceDBA Jan 28 '11 at 08:24
  • 4
    XIVSolutions, your statement about DW is certainly untrue. Many data warehouses are designed to at least Boyce-Codd or 5th Normal Form. Normalization is an important discipline for DW design - especially for larger, more complex warehouses. – nvogel Jan 29 '11 at 13:39
  • @dportas - Absolutely agree that normalization is critical for Db Design. And I will cop straight out to not having any direct experience with datawherehouses. I must either be a victim of mis-information, or a bad memory. I was left with the impression that in a DW, data from one or more fully-normalized RBD(s) is aggregated in a different table structure which does not follow the rules (or at least, not the same rules) of normalization you would find in an RDB. If this is not true, then I stand corrected. – XIVSolutions Jan 29 '11 at 16:07
  • 1
    Obviously, I need to be more careful how I present my understnading of things. MY normal learning process is to present what I think I know. I often qualify with phrases like "my understanding is . . .". However, the objective is to present what I know, and if I am corrected, then I have learned something. In this forum, though, I could then become guilty of further propegating bad information. Thanks for your feedback. – XIVSolutions Jan 29 '11 at 16:10