Best Practices when Choosing SQL Keys Types

Question

I am very new to the SQL database. I need to create a database for my internship as our DA resigned suddenly.

The data is available but it is not inputted into a database yet. I am trying to follow the tutorials online but got stuck on what to choose for the different key types.

I hope to get the feedback of more experienced folks to get your guidance.

Table columns:

entry id (unique)
entry timestamp
username (unique but can appear more than once if the same user input a new meal record)
email address
user first and last name
meals taken date
meal type
meal calories
meal duration
meal cost
meal location
user notes

For primary key = entry id

For candidate key, I will pick username & and entry ID. Are there other columns that I should select as candidate keys? Would email make more sense? But a username can be repeated if they input another meal record. Does that matter?

For compound Key = 
email address + user first and last name?
record date + user name?

Are there other keys I need to classify?

Online tutorials these are the most basic keys I need to identify. But I am not sure if I am making the right choice. I appreciate any feedback.

Are you familiar with the idea of Normalization in database design? It looks like some of those attributes probably should go in other tables. If username is not unique then it isn't a key (it might be one *part* of a composite key). You shouldn't just assume keys based on column names. Keys have to be based on an understanding of the business requirements, not just the data alone. — nvogel, Feb 16 '20 at 03:33
Time to follow a published academic textbook on information modelling, the relational model & DB design & querying. (Manuals for languages & tools to record & use designs are not such textbooks.) (Nor are wiki articles or web posts.) Dozens of published academic information modeling & DB design textbooks are online free in pdf. stanford.edu has a free online course. (But asking for resources outside SO is off-topic.) PS This question effectively asks us to rewrite a textbook (with bespoke tutorial). Follow one & ask 1 specific question about where you are 1st stuck. — philipxy, Feb 16 '20 at 04:17
"unique but can appear more than once" is a contraction. "For Primary Key = entry id For Candidate Key, I will pick username & and entry ID." A relational PK is a CK & a CK cannot contain another CK. Although an SQL PK actually corresponds to a relational superkey. A "compound" key is just one that has more than one column. You really need to read a textbook. — philipxy, Feb 16 '20 at 04:22

score 2 · Answer 1 · answered Feb 16 '20 at 04:03

Your data seems to contain multiple entities. Based on your simple description, I can identify:

users
meals
locations

Then there seems to be this thing called a entries which is a user, eating (buying?) a meal at a location. This is a 3-way junction table among the entities.

This is a guess on what you are trying to represent. But it sounds like multiple tables.

Schwern · Answer 2 · 2020-02-16T05:09:48.127

I don't know what database you're using, I'll use Postgres because it's free, follows the SQL standard, has good documentation, and is very powerful.

As Gordon said, you seem to have three things: users, meals, and locations. Three things means one table for each. This avoids storing redundant data. The whole topic is database normalization.

create table users (
    id bigserial primary key,
    username text not null unique,
    email text not null unique,
    first_name text not null,
    last_name text not null
);

create table meals (
    id bigserial primary key,
    type text not null unique,
    -- If calories vary by user, move this into user_meals.
    calories integer not null
);

create table locations (
    id bigserial primary key,
    -- The specific information you have about a location will vary,
    -- but this is a good start. I've allowed nulls because often
    -- people don't have full information about their location.
    name text,
    address text,
    city text,
    province text,
    country text,
    postal_code text
);

You asked about compound keys. Don't bother. There's too many potential problems. Use a simple, unique, auto-incrementing big integer on every table.

Primary keys must be unique and unchanging. Usernames, names, email addresses... these can all change. Even if you think they won't, why bake the risk into your schema?

Foreign keys will be repeated all over the database and indexes many times. You want them to be small and simple and fixed size to not use up any more storage than necessary. Integers are small and simple and fixed size. Text is not.

Compound primary keys potentially leak information. A primary keys refer to a row, and those often show up in URLs and the like. If we were to use the user's email address or name that risks leaking personally identifiable information.

I've chosen bigserial for the primary key. serial types are auto-incrementing so the database will take care of assigning each new row a primary key. bigserial uses a 64 bit integer. A regular integer can only hold about 2 to 4 billion entries. That sounds like a lot, but it isn't, and you do not want to run out of primary keys. bigserial can handle 9 quintillion. An extra 32 bits per row is worth it.

Some more notes.

Everything is not null unless we have a good reason otherwise. This will catch data entry mistakes. It makes the database easier to work with because you know the data will be there.
Similarly, anything which is supposed to be unique is declared unique so it is guaranteed.
There are no arbitrary limits on column size. You might see other examples like email varchar(64) or the like. This doesn't actually save you any space, it just puts limits on what is allowed in the database. There's no fundamental limit on how long a username or email address can be, that's a business rule. Business rules change all the time; they should not be hard coded into the schema. Enforcing business rules is the job of the thing inputting the data, or possibly triggers.

Now that we have these three tables, we can use them to record people's meals. To do that we need a fourth table to record a user having a meal at a location: a join table. A join table is what allows us to keep information about users, meals, and locations each in one canonical place. The users, locations, and meals are referred to by their IDs known as a foreign key.

create table user_meals (
    id bigserial primary key,

    user_id bigint not null references users(id),
    meal_id bigint not null references meals(id),
    location_id bigint not null references locations(id),

    -- Using a time range is more flexible than start + duration.
    taken_at tstzrange not null,

    -- This can store up to 99999.99 which seems reasonable.
    cost numeric(7, 2) not null,

    notes text,

    -- When was this entry created?
    created_at timestamp not null default current_timestamp,
    -- When was it last updated?
    updated_at timestamp not null default current_timestamp
);

Because we're using bigserial for everything's primary key, referring to other tables is simple: they're all bigint.
As before, everything is not null unless we have a good reason otherwise. Not every meal will have notes, for example.
Fairly standard created_at and updated_at fields are used to record when the entry was created or updated.
Imprecise numeric types such as float or double are avoided, especially for money. Instead the arbitrary precision type numeric is used to store money precisely.
numeric requires we give it a precision and scale; we must choose a limit, so I picked something unreasonably high in my currency. Your currency may vary.
Rather than storing a start time and meal duration, we take advantage of Postgres's range types to store a time range from when the meal started to when it ended. This allows us to use range functions and operators to make queries about the meal time much simpler. For example, where taken_at @> '2020-02-15 20:00' finds all meals which were being taken at 8pm on Feb 15th.

There's more to do, such as adding indexes for performance, hopefully this will get you started. If there's one take away it's this: don't try to cram everything into one table.

Try it out.

Best Practices when Choosing SQL Keys Types

2 Answers2