0

For example, let's say I have a table like this:

CREATE TABLE people (
    firstname TEXT,
    lastname TEXT,
    weight FLOAT,
    zodiac_sign TEXT    
);

The first three columns are going to have many distinct values, and the number of them will grow without bound as I add more rows. But zodiac_sign will always be one of 12 values.

I'm assuming that SQLite will use 11 bytes for every instance of 'sagittarius' (i.e. that it's not smart enough to infer that zodiac_sign is basically an enum that can be stored in a single byte).

Does this suggest that, if the number of rows I'll be dealing with is non-trivial, I should split off another table like this:

CREATE TABLE people (
    firstname TEXT,
    lastname TEXT,
    weight FLOAT,
    zodiac_id INTEGER NOT NULL REFERENCES zodiac_signs(zodiac_id)
);

CREATE TABLE zodiac_signs (
    zodiac_id INTEGER PRIMARY KEY,
    name TEXT
);

And would this still be a good practice for a text column that holds a small number of distinct values but which isn't constrained to some set of values that will never change? e.g. if I had a column for country of birth.

Coquelicot
  • 8,775
  • 6
  • 33
  • 37
  • 2
    Yes, that's generally the accepted and "correct" way of doing things. – Dave Zych Sep 02 '15 at 22:40
  • possible duplicate of [Difference between storing integer or string in database table](http://stackoverflow.com/questions/8064743/difference-between-storing-integer-or-string-in-database-table) – Bulat Sep 02 '15 at 23:18

1 Answers1

1

In the first table design, the field zodiac_sign could just as well be renamed enter_anything_you_want. Good data integrity practices demand that the domain of each field be constrained whenever it is meaningful. For example, for general use, a birthdate field may be constrained only to be in the past, never in the future. For a DMV database, the same field may be constrained to be at least 16 years in the past.

When the field is text with a fixed number of valid alternatives, you could define it with a check constraint:

zodiac_sign text (check zodiac_sign in( 'Leo', 'Cancer', ... )),

(The examples I give may not directly translate into SQLite code. But I wanted to make this a more general discussion.)

However, this is somewhat awkward and would have to be repeated in every table containing that field. Meaning that should the list ever change, there could be a lot of alter table commands needed to update the new list. Although it may seem obvious that zodiac signs are not likely to change, that is not generally a dependable aspect of such fields.

Implementing a lookup table of the textual values, giving them a key value for foreign key reference is a much better way. It limits the valid values just like a check constraint, but all the values are in one location so changes are localized. Plus, additional fields could be added.

create table Zodiac(
    ID        integer not null,
    Name      text    not null,
    StartDate date,
    EndDate   date,
    constraint PK_Zodiac primary key( ID )
);

The dates would make calculating which sign to associate with a given date a lot easier. (Yes, Sqlite does not have a native Date datatype -- use whatever type is already your preference.)

TommCatt
  • 5,498
  • 1
  • 13
  • 20