0

I'm trying to better understand the use of Primary Keys in SQL (any flavour). I'm having a bit of a block on understanding where they can be used.

From W3schools SQL Primary Key Definition a primary key is defined as:

The PRIMARY KEY constraint uniquely identifies each record in a table.

Primary keys must contain UNIQUE values, and cannot contain NULL values.

If I have the following table, where Student_id is the column to which I want to make a primary key (so use it to uniquely identify a record in a table):

Student_id subject_studied
E1          maths
E1          science
E1          english
B2          arts
V3          gym
C5          science
C5          maths

I assume that making Student_id the primary key is allowed in this case, as each record can be uniquely identified.

However if we take a different but similar table, with lots of duplicate records:

Student_id subject_studied
E1          science
E1          science
E1          science
B2          arts
V3          gym
C5          science
C5          science

then Student_id cannot be a primary key as it does not uniquely identify each record in the table.

Is this the correct understanding regarding primary keys? Assume for this example that there are no NULL values.

Most examples online, do not show a repeated value in a column when creating primary keys such as the first table in which Student_id is repeated several times but each record is a unique record.

appreciate any advice!

  • 3
    Your Student_id column contains duplicate values, so it can't be the PK. – jarlh Nov 25 '19 at 12:00
  • 1
    If the first table, you can use a composite primary key of (`student_id, subject_studied`), which means the *pair* must be unique. In the second, you can't. In neither table you can use only `student_id`. – GSerg Nov 25 '19 at 12:02
  • 1
    in this situation you should break you table into 3 tables students,subjects and studentSubjects with unique keys for studentSubjects PK will be both (student_id,subject_id) – Ahmed Yousif Nov 25 '19 at 12:03
  • 1
    The Student_id cannot be set as primary key in the table shown here, as the Student_id is repeathing there. If you have a table named student, you can mark student_Id as primary key. – Ajith Nov 25 '19 at 12:04
  • 3
    Since nobody's said it yet, I will: although the reference material on W3Schools is more reliable than it used to be, it's still not a particularly high-quality resource. You might want to spend some time looking around for a different tutorial that will explain basic database concepts like this. – IMSoP Nov 25 '19 at 12:19
  • I agree with IMSoP. – Thorsten Kettner Nov 25 '19 at 12:20

4 Answers4

2

You should have a students table that has one row per student with the student_id as a primary key. This table would probably have other information too:

create table students (
     student_id int primary key,
     name varchar(255),
     date_of_birth date,
     . . .
);

Your table then appears to be a junction table, identifying studies. In many databases, the subjects would be in a separate reference table, but I'll leave that out. This table can have a composite primary key:

create table student_subjects (
    student_id int references students(student_id),
    subject varchar(255),
    primary key (student_id, subject)
);

Note the foreign key reference to the students table.

Also note: I'm not a big fan of composite primary keys. I would typically add a specific primary key for the table:

create table student_subjects (
    student_subjects int auto_increment primary key,  -- "auto_increment" syntax varies by database
    student_id int references students(student_id),
    subject varchar(255),
    unique(student_id, subject)
);
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

The primary keys are only unique. In your example the first table wont be able to have a primary key with the student_id (rememeber you care only about the column).

In order to have a primary key on the first table you need to have a double primary key with 2 columns. A primary key of 2 columns means that the 2 columns combinations are unique.

On your second table you can't create a primary key just with those columns. No column is unique and at the same time no combination of columns are unique.

It's not about the records, it's about the column(s).

A_kat
  • 1,459
  • 10
  • 17
2

As the definition states, Primary keys must contain UNIQUE values, and cannot contain NULL values. In your table, there are multiple duplicate values (such as three instances of E1), so that column cannot be used as a primary key.

For each primary key, you should only get one row. So, consider how many rows would be returned if ask for all the student_id whose value is "E1". You would get three rows, not one.

John Go-Soco
  • 886
  • 1
  • 9
  • 20
1

A primary key uniquely identifies a row in its table.

For instance a student table contains students. Each student is identified by an id. (Primary key is bold and italic):

  • student (student_id, first_name, last_name, ...)

You have a table of studied subjects. Its natural key would be the combination of student_id and subject. Usually you would also have a subject table:

  • student (student_id, first_name, last_name, ...)
  • subject (subject_id, name, ...)
  • student_subject (student_id, subject_id, grade, ...)

Some people prefer technical IDs. You can add a technical ID to the student_subject table and make this primary key. But the combined student_id and subject_id would still be a unique key (i.e. there should still be a unique constraint on them).

The last table you are showing wouldn't have a primary key. But then, what meaning would the entries in that table have? That student E1 studies science and science and science? It is extremely rare for a table not to have a primary key, so if you hit such a situation, reconsider your table design first.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • I'd slightly disagree with your last sentence. In most cases, a table without _any_ candidate keys indicates some problem with how you're modelling the data - what does it mean for the same row to exist twice or three times? On a practical level, such a table is effectively insert-only, since you can't target a particular row to delete or update. – IMSoP Nov 25 '19 at 12:15
  • @IMSoP: I agree, it's very rare. I see it happen most in settings tables where keys can be null for fallback, e.g. `setting(country_id, city_id, value)` where a country with a city has precedence over country + null and this has precedence over null + null. – Thorsten Kettner Nov 25 '19 at 12:27
  • Ah, yes, I hadn't thought about null entries rather than duplicate entries. However, I'd probably a) add an auto-increment/uuid primary key to make maintenance easier, and b) add a bunch of check and unique constraints that ensured every row was unique anyway, [using partial indexes as discussed here](https://stackoverflow.com/a/8289253/157957). I think it's better to start off learning "always have a Primary Key", and wait until you _really_ know what you're doing to decide you've hit one of the rare exceptions. – IMSoP Nov 25 '19 at 12:40
  • @IMSoP: I like your rationale and changed my answer accordingly, so as to point out that a table without a primary key is so rare that it should raise suspicion :-) – Thorsten Kettner Nov 25 '19 at 13:34