10

I am not a database guy, but am trying to clean up another database. So my question is would normalizing the gender table be going too far?

User table:
userid int pk,
genderid char(1) fk
etc...

gender table:
genderid char(1) pk,
gender varchar(20)

Now at first it seemed silly to me, but then I considered it because i can then have a constant data source to populate from or bind from. I will be using WPF. If it was another framework I would probably avoid it, but what do you think?

Adam
  • 1,580
  • 21
  • 40
  • 4
    This is maybe not what you are asking but creating a Gender table has nothing to do with normalization. In creating that table you are apparently just substituting the dependency userid->genderid in place of userid->gender. The User table is therefore no more normalized than it was without the Gender table. All you've done is change the name and type of an attribute. So purely from the point of view of normalization such a change is totally unnecessary. – nvogel Aug 18 '10 at 16:00
  • In most languages, it would be hard to write a grammatically-correct letter to someone without knowing their gender. – dan04 Aug 19 '10 at 13:37
  • 2
    Argh. My brain is broken by quasi-memes. Reading your question, I can't help thinking of this - http://qntm.org/gay (Which is an awesome read.) – pinkgothic Aug 19 '10 at 16:39
  • It also explains how it differs from the old polygamy and interracial marriage issues. Those were just a simple `CREATE UNIQUE INDEX` and `DROP CONSTRAINT`. – dan04 Aug 20 '10 at 01:17
  • 3
    @meagar: I would tell you it's none of your business why I had business requirements that said I need to collect the gender information. – Ken White Apr 28 '11 at 22:12
  • See also https://stackoverflow.com/questions/4175878/storing-sex-gender-in-database – Raedwald Oct 01 '19 at 11:15

7 Answers7

11

Whether or not you choose to normalize your table structure to accomodate gender is going to depend on the requirements of your application and your business requirements.

I would normalize if:

  • You want to be able to manage the "description" of a gender in the database, and not in code.
    • This allows you to quickly change the description from Man/Woman to Male/Female, for example.
  • Your application currently must handle, or will possible handle in the future, localization requirements, i.e. being able to specify gender in different languages.
  • Your business requires that everything be normalized.

I would not normalize if:

  • You have a relatively simple application where you can easily manage the description of the gender in code rather than in the database.
  • You have tight programmatic control of the data going in and out of the gender field such that you can ensure consistency of the data in that field.
  • You only care about the gender field for information capture, meaning, you don't have a lot of programmatic need to update this field once it is set the first time.
Ben McCormack
  • 32,086
  • 48
  • 148
  • 223
  • Thank you. I didn't consider the localization issue, which I would like to plan for just in case. – Adam Aug 18 '10 at 15:44
  • 1
    Also, other values allow multiple null meanings, such as "Not known", "Not disclosed", or other possibilities. – Adam Musch Aug 18 '10 at 18:07
  • 1
    @Adam I would make the field a `char(1)` and simply allow null. – user229044 Aug 18 '10 at 19:14
  • 1
    @meagar - and what does NULL mean then - unknown, not disclosed, undefined, transhuman? It's an indeterminate value. Plus, inclusion of nulls generally makes logic less clear (gotta do trivalue, or use coalesce()) when in this case, it's pretty avoidable. – Adam Musch Aug 19 '10 at 04:49
  • @meagar: The US Federal government defines more genders than male and female, such as transgender and transsexual. NULL isn't a gender; if you want to allow non-specified, make a value that means non-specified. NULL specifically means something in data and SQL - use it properly. @Ben: +1, for just the first paragraph. I'd give you a second vote if I could for the remainder of your answer. – Ken White Apr 28 '11 at 22:15
  • @Ken: In many people's books, the only way to use null properly is not to use it at all (as an actual persisted value). – Adam Robinson Apr 29 '11 at 00:16
  • @Ken I believe there are many more letters in the alphabet than M and F. And when exactly should you use null, if not to indicate a lack of value? – user229044 Apr 29 '11 at 03:47
  • @meagar: As @Adam said, NULL complicates things. When it's possible to not allow/use it, you should. It's appropriate when there's no other choice; in this case, there are alternatives. – Ken White Apr 30 '11 at 01:57
  • 1
    @Ken I'm begin completely open-minded here, but you're going to have to make a better case than "null complicates things". In 10 years of database development I've never come up against a situation where representing "unknown", "undefined" or "not set" with a NULL has caused a problem. Quite the opposite, it usually clarifies things significantly. – user229044 Apr 30 '11 at 23:48
  • @meagar: NULL complicates things because you have to handle it specially; that's why COALESCE and ISNULL exist. There's even a special way to handle NULL (is/is not) rather than the normal operators (equality, BETWEEN, and so forth). If you have a valid reason for allowing NULL, that's one thing. If you're allowing it because you're too lazy to create appropriate values (like M/F/Unknown), it's another. :) – Ken White May 02 '11 at 00:39
3

I'm also not a database guy but I do it. It gives me the possibility to assure that only the genders are entered, that are valid (referencial integrity) and I can also use it to populate the selection control.

HCL
  • 36,053
  • 27
  • 163
  • 213
3

I can think of applications where I'd use different columns for sex and gender, have three values for sex (male/female/decline to state) and six for gender (male/female/transgendered male/transgendered female/asexual/decline to state). Granted, I live in San Francisco, where there's an level of public discussion of transgender issues that much of the rest of the world is behind the curve on.

The point is: without a compelling reason to think otherwise, I'd assume that any simplifying assumption I made about demographics was limited and parochial. The cost of breaking sex out to its own table is small now and expensive later. I wouldn't avoid the small cost on the basis of an assumption.

Robert Rossney
  • 94,622
  • 24
  • 146
  • 218
1

Well, your company might have a requirement that, if possible, everything be normalized.

Also, depending on the business & data, you might need to include transgenders as well which would create 3+ genders (I don't know how many there are, haven't checked)

PowerUser
  • 11,583
  • 20
  • 64
  • 98
  • Even if there is one, you still would want to do it... Take the 5 mins now, rather than the 60 mins later... – NinjaCat Aug 18 '10 at 15:24
0

I'll remark on another aspect: sorting. Normally, 'M' sorts after 'F'; in a project one time, a database table had a gender field with either of those two values. There was a desire to be able to sort results on the gender (census data) and a further preference to have 'M' appear before 'F'. My solution was to add a separate lookup table, assigning the Male value an ID of 0, and Female an ID of 1. So queries on the main table could easily be sorted on the new genderID field.

Grant Palin
  • 4,546
  • 3
  • 36
  • 55
  • 4
    What database query language doesn't let you write `ORDER BY Gender DESC`? – dan04 Aug 18 '10 at 19:12
  • @dan04 Keep in mind that this was several years ago, so I'm not certain of the circumstances now. But I think the other requirement was to be able to filter on that column, as well as sort it, and numbers are easier to filter than characters. I think the sort+filter need led to the solution. – Grant Palin Aug 18 '10 at 21:54
0

Just thought I'd throw an opinion in here. @Ben McCormack has a great answer with a minor caveat: Regarding localization, there are sometimes better ways of handling this than having the values defined directly in your database.

For example, you mention WPF. With .Net you have various localization resources that are much better suited to managing differences in whether to emit "Male" or "Samec" (Czech).

By letting the built in localization features take care of this you don't have to worry about having multiple database records defining the exact same thing.. which could complicate reporting.


That said, I'd suggest that you might want to consider if "gender" is really what you are after. Gender is defined as "a set of characteristics distinguishing between male and female".

On the face of it this sounds like your standard Male/Female options; but it's not. Gender is much more complicated than that as it needs context in order to have meaning. For example, in the context of a relationship a Male (by Sex) could have one of several "genders": Masculine, Feminine or even Neutral. This is regardless of what sex their partner is.

In the context of just an individual, a Male (by Sex) might be Masculine, Feminine, Neutral, Transgender, Intersex or any of a number of other options acceptable to the person filling out the form.

At least one person commented that Gender is necessary in order to determine the honorific used in mailings. I'd suggest that there is no relationship between gender and those honorifics. For example, a Female (by Sex) might want to be addressed as Ms/Miss/Mrs/Dr/Madam/Professor or even Mr if they are in the process of, or have completed, surgery to become "Male". That list is by no means all inclusive and in any event it's much better to allow that person to select how they want to be addressed.


Which leads me to my last item: Before collecting any piece of data you should have a defined reason for having it. My company specializes in data collection through online forms. One of the things we do is look at what our clients are asking for and go field by field to determine if the data is even used anywhere.

More often than not an entity (company/governmental/etc) asks for far more information than they care about. This can have additional consequences in the event the data is lost, stolen or simply viewed by unauthorized individuals. Further there is a burden placed on the person filling out the forms for each field they are asked to complete.

I bring this up because "Gender" is almost never needed for any normal system. Instead, Sex is a better qualifier and even then it has little value. Exempting dating sites and governmental census.

NotMe
  • 87,343
  • 27
  • 171
  • 245
-2

Yes. I think that You can use enum in code and bind eventuatly to it.

null - unknow ; 0 - male ; 1 - female;

or you can use bool type to define this

null - unknow; true - male; false - female

  • 1
    Database design is supposed to make things easier to understand. `gender == true` is insane, and `gender == 1` isn't much better. Also I think half the world's population will be insulted by the idea that `female == false` (although obviously not half of the world's population of programmers) – APC Aug 18 '10 at 16:55
  • I never sad that this is good, but possible, everything depend of the business needs. – Damian Leszczyński - Vash Aug 18 '10 at 17:22
  • 2
    gender: true OR sex: yes please – yeffach nollid Aug 18 '10 at 19:20
  • 5
    @APC I think the world could understand defining female as "0" and male as "1", there's some pretty obvious symbology going on there :p In before comment flagged as offensive! – user229044 Aug 18 '10 at 19:25
  • -1 because it's a very poor answer (and because by definition a bool value can't be NULL - it's either true or false). – Ken White Apr 28 '11 at 22:26
  • You have right about that boolean value can't be NULL, but the column that type is BOOLEAN can contain NULL entries. Then any comparision action ends with UNKNOWN (if ANSI_NULLS is on). – Damian Leszczyński - Vash Apr 29 '11 at 13:09