14

You usually normalize a database to avoid data redundancy. It's easy to see in a table full of names that there is plenty of redundancy. If your goal is to create a catalog of the names of every person on the planet (good luck), I can see how normalizing names could be beneficial. But in the context of the average business database is it overkill?

(Of course I know you could take anything to an extreme... say if you normalized down to syllables... or even adjacent character pairs. I can't see a benefit in going that far)

Update:

One possible justification for this is a random name generator. That's all I could come up with off the top of my head.

Kenneth Cochran
  • 11,954
  • 3
  • 52
  • 117
  • I'm not sure what you mean, are you talking about setting up a separate table like "FirstNames" and then linking users to it with "FirstNameID" or something like that? – Chad Birch Apr 23 '09 at 16:37
  • What kind of table do you have full of names? If there's multiple references to the same -person- and not just the same -name- then yes, you should probably have a table of people. – Chad Birch Apr 23 '09 at 16:38
  • 9
    Erm. Aren't names already basically nomalised down to characters anyways? Surely that's the entire point of an alphabet? – Ed James Apr 23 '09 at 16:38
  • Well, since many names can be first, middle or last names a simple "Name" table would produce the least redundancy. Then a "FullName" table would have three foreign keys all pointing to the same "Name" table. Of course, this assumes a culture that doesn't have more than three names to a individual. Could be useful for a random name generator. No? – Kenneth Cochran Apr 23 '09 at 16:53
  • You'd have to do joins on both the `first_names` table and `last_names` table (maybe even `middle_names` table). Among other cons (i.e. worse performance, last thing **Yishai** said), that would be a hassle! – JohnB Jun 01 '10 at 20:41
  • Is there a variant of this question that is not about the meaning but implementation? That is: does it reduce storage use? Does it improve performance? Some kind of a lookup table, so if someone changes his name, he would get a new nameid in his Users table row, with a new row in the Names table if not already present. – David Balažic Jun 06 '18 at 11:48

19 Answers19

54

Yes, it's an overkill.

People don't change their names from Bill to Joe all at once.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • 22
    This really illustrates the point. The reason for normalization is so that if you have to change something that's the same all over the database that you don't have to change a whole bunch of records. But you would never have to change the name of everybody names Bill to Joe. – Kibbee Apr 23 '09 at 16:48
  • I'm not disputing the claim that it's overkill for most cases but I think your reasoning against it is flawed. Just playing devil's advocate here... Say you have a FirstName table and a FullName table. if a single person changed their name from Bill to Joe wouldn't you update the foreign key in the FullName table and not the record in the FirstName table? – Kenneth Cochran Apr 23 '09 at 17:08
  • @Kenneth: Denormalization is just for the cases I described: it simplifies mass updates. If a company changes its name, you just change one field in the joined table and the SQL query the badge printing machine issues returns new results. Joins are bad for performance and complexity, the only thing they are good for is mass updates. – Quassnoi Apr 23 '09 at 17:20
  • 1
    REPLACE('Denormalization', 'Normalization') in the comment above – Quassnoi Apr 23 '09 at 17:22
  • I think there's more to normalization than mass updates. You look at nearly any contact database and you'll see City and State tables but I can't remember the last time a city or state changed their name. – Kenneth Cochran Apr 23 '09 at 17:39
  • @Kenneth: name is not the only property of a city. There is area code, population, and thousand more properties, which can be changed all right. As for a name as an entity, it does not have other properties but its spelling. – Quassnoi Apr 23 '09 at 18:16
  • @Kenneth: But you are right, normalization is good for one more thing: indexability. Creating an index on a surrogate INTEGER primary key is much easier than on a random property string. – Quassnoi Apr 23 '09 at 18:17
  • 6
    @Kenneth: By the way, my grandgrandmother from this post: http://stackoverflow.com/questions/768123/how-would-you-live-without-mysql/768135#768135 was born in Saint-Petersburg, married in Petrograd, defended Leningrad during the blockade, and died in Saint-Petersburg. And she never moved out of the city. – Quassnoi Apr 23 '09 at 18:28
35

Database normalization usually refers to normalizing the field, not its content. In other words, you would normalize that there only be one first name field in the database. That is generally worthwhile. However the data content should not be normalized, since it is individual to that person - you are not picking from a list, and you are not changing a list in one place to affect everybody - that would be a bug, not a feature.

Yishai
  • 90,445
  • 31
  • 189
  • 263
6

How do you normalize a name? Not all names have the same structure. Not all countries or cultures use the same rules for names. A first name is not necessarily just a first name. People have variable numbers of names. Some countries don't have the simple pair of firstname/lastname. What if my first name just so happens to be your last name, should they be considered the same in your database? If not, then you get into the problem that last name might mean different things in different countries. In most countries I know of, it is a family name. Your last name is the same as at least one of your parents' last name. On Iceland, it is your father's first name, followed by "son" or "daughter". So the same last name will mean completely different things depending on whether you encounter it in Iceland and the US.

In some cultures it is common when getting married, for the woman to take her husband's last name. In other cultures, that's completely optional, or might even work the opposite way.

How can you normalize this? What information would it gain you? If you find someone in your database who has "Smith" as the last word making up their name, what does that tell you? It might not be their family name. It might only be part of the family name. It might be an honorary in some language, but which according to their culture, should be considered part of the name.

You can only normalize data if it follows a common structure.

jalf
  • 243,077
  • 51
  • 345
  • 550
5

If you had a need to perform queries based on diminutive names I could see a need for normalizing the names. e.g. a search for "Betty" may need to return results for "Betty", "Beth", and "Elizabeth"

ccsimpson3
  • 51
  • 1
  • 3
2

Yes, definitely overkill. What's a few dozen bytes betewen friends?

Alnitak
  • 334,560
  • 70
  • 407
  • 495
2

Maybe if you work in the Census office it might make sense. Otherwise, see every other answer :)

Patrick McDonald
  • 64,141
  • 14
  • 108
  • 120
1

I would say yes, it is going too far in 95%+ of the cases.

Mark Sherretta
  • 10,160
  • 4
  • 37
  • 42
  • 3
    Okay, I'll bite; what are the other 5% of cases? :-) – Paul Sonier Apr 23 '09 at 16:41
  • I'm biting too, don't you mean 5%- ;-) – Patrick McDonald Apr 23 '09 at 16:44
  • 2
    Half of the Koreans are Kim's, Lee's or Park's, and there are only about 250 different surnames (they could probably even fit only one byte!) – Quassnoi Apr 23 '09 at 16:44
  • Good point. But fortunately, modern DBMS recognize such extreme fields and adapt their storage and indexing behavior to such data patterns, so it is handled on a lower level than the data model. So formally, every Korean can have a distict first name, even if most don't, and still the db handles them as efficient as a bot army with numbered, unique first names. (Or even more efficient.) – TheBlastOne Jul 19 '10 at 16:34
  • 1
    TheBlastOne: Can you please name a DBMS that can recognize that there are only 250 different surnames in my table and optimize them to only require 1 byte of storage? I'm not familiar with one. – Gabe Jul 19 '10 at 16:42
  • @Gabe certain columnar in-memory databases would recognize this automatically and compress the representation of its value. – adib Jul 07 '17 at 07:38
1

Generally yes. Normalizing to that level would be going to far. Depending on the queries (such as phone books where searches by last name are common) it might be worthwhile. I expect that to be rare.

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
Adam Davis
  • 91,931
  • 60
  • 264
  • 330
  • 2
    And even in the phone book scenario a good index would be all that's needed, but there are still very narrow uses where normalizing the name makes sense. – Adam Davis Apr 23 '09 at 16:39
1

Yes. I cannot think of an instance where the benefits outweigh the problems and query complications.

billb
  • 3,608
  • 1
  • 32
  • 36
1

No, but you might want to normalise to a canonical record for a customer (so you don't get 5 different entries for 'Bloggs & Co.' in your database. This is a data cleansing issue that often bites on MIS projects.

ConcernedOfTunbridgeWells
  • 64,444
  • 15
  • 143
  • 197
1

You often don't go over fourth form normalization in a database. Therefore seventh form normalization is quite a bit overboard. The only place this might even be a remotely plausible idea is in some kind of massive data warehouse.

David
  • 24,700
  • 8
  • 63
  • 83
0

I generally haven't seen a need to normalize the name, mainly because that adds a performance hit on the join that will always be called, and doesn't give any benefit.

If you have so many similar names, and have a storage problem then it may be worth it, but there will be a performance hit that would need to be considered.

James Black
  • 41,583
  • 10
  • 86
  • 166
0

I would say it is absolutely overkill. In most applications, you display folks' names so often, every query involved with that is going to look that much more complex and harder to read.

Dana
  • 32,083
  • 17
  • 62
  • 73
0

Yes, it is. It is commonly recognized that just applying all of the Rules of Normalization can cause you to go way too far and end up with an overnormalized database. For example, it would be possible to normalize every instance of every character to a reference to a character enumeration table. It's easy to see that that's ridiculous.

Normalization needs to be performed at a level that is appropriate for your problem domain. Overnormalization is as much a problem as undernormalization (although, of course, for different reasons).

Paul Sonier
  • 38,903
  • 3
  • 77
  • 117
0

There might be a case where being able to link married/maiden names would be useful.
Recently had a case where I had to rename thousands of emails in exchange because somebody got divorced and didn't want any emails listing her as married_name@company.com

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
0

No need to normalize to that level unless the names make up a composite primary key and you have data that is dependant on one of the names (e.g. anyone with the surname Plummer knows nothing about databases). In which case, by not normalizing, you would violate second normal form.

John Plummer
  • 233
  • 2
  • 6
0

I agree with the general response, you wouldn't do that.

One thing comes to mind though, compression. If you had a billion people and you found that 60% of first names were pulled from 5 very common names, you could use some tricky bit manipulation to reduce the size very significantly. It would also require very customized database software.

But this isn't for the purpose of normalization, just compression.

Bill K
  • 62,186
  • 18
  • 105
  • 157
0

You should normalize it out if you need to avoid the delete anomaly that comes with not breaking it out. That is, if you ever need to answer the question, has my database ever had a person named "Joejimbobjake" in it, you need to avoid the anomaly. Soft deletes is probably a much better way than having a comprehensive first name table (for example), but you get my point.

JP Alioto
  • 44,864
  • 6
  • 88
  • 112
0

In addition to all the points everyone else has made, consider that if you were implementing a data entry operation (for example), and were to insert a new contact, you would have to search your first name and last name tables to locate the correct Id's and then use those values. But then this is further complicated by the occasion when the name is not on the FN and/or LN tables, then you have to insert the new first/last name and use the new id(s).

And if you think that you have a comprehensive list of names, think again. I work with a list of over 200k unique first names and I'd guess it represents 99.9% of the US population. But that .1% = a lot of people. And don't forget the foreign names and misspellings...

Marc Bernier
  • 2,928
  • 27
  • 45