Can a database table be without a primary key?

Question

Can anyone tell me if a table in a relational database (such as MySQL / SQL SERVER) can be without a primary key?

For example, I could have table day_temperature, where I register temperature and time. I don't see the reason to have a primary key for such a table.

http://stackoverflow.com/questions/840162/should-each-and-every-table-have-a-primary-key — Daniel Vassallo, Mar 25 '10 at 12:58
@bodacydo - What you just described is a primary key. If you are registering temperatures (one per day) the PK becomes the datetime field. Without it you will run into very bad performance when you try to query your data table. — JonH, Mar 25 '10 at 12:58

score 43 · Answer 1 · answered Mar 25 '10 at 13:02

43

Technically, you can declare such a table.

But in your case, the time should be made the PRIMARY KEY, since it's probably wrong to have different temperatures for the same time and probably useless to have same more than once.

Logically, each table should have a PRIMARY KEY so that you could distinguish two records.

If you don't have a candidate key in you data, just create a surrogate one (AUTO_INCREMENT, SERIAL or whatever your database offers).

The only excuse for not having a PRIMARY KEY is a log or similar table which is a subject to heavy DML and having an index on it will impact performance beyond the level of tolerance.

answered Mar 25 '10 at 13:02

Quassnoi

413,100
91
616
614

1

Celko would say it's not a table if it doesn't have a PK! – Martin Smith Mar 25 '10 at 13:04
1

@Martin: relational databases and `SQL` operate with multisets, not sets, so it would still be a table. However, internally, this is still a set (the records should be distinguished), and a surrogate `PK` just helps making the records distinguishable from the outside. – Quassnoi Mar 25 '10 at 13:08
+1 Interesting I had never never heard that before. Someone should tell Joe! http://www.eggheadcafe.com/software/aspnet/31906606/delete-only-9-of-10-rows.aspx – Martin Smith Mar 25 '10 at 13:24
@Martin: also in my blog: http://explainextended.com/2009/03/14/deleting-duplicates/ But good luck doing the same in `MySQL` :) – Quassnoi Mar 25 '10 at 13:39

Grzegorz Gierlik · Answer 2 · 2015-05-19T07:52:55.567

Like always it depends.

Table does not have to have primary key. Much more important is to have correct indexes. On database engine depends how primary key affects indexes (i.e. creates unique index for primary key column/columns).

However, in your case (and 99% other cases too), I would add a new auto increment unique column like temp_id and make it surrogate primary key.

It makes much easier maintaining this table -- for example finding and removing records (i.e. duplicated records) -- and believe me -- for every table comes time to fix things :(.

score 6 · Answer 3 · answered Mar 25 '10 at 12:56

6

If the possibility of having duplicate entries (for example for the same time) is not a problem, and you don't expect to have to query for specific records or range of records, you can do without any kind of key.

answered Mar 25 '10 at 12:56

p.marino

6,244
3
25
36

Avinash Pawar · Answer 4 · 2019-10-18T08:49:06.667

Even if you do not add a primary key to an InnoDB table in MySQL, MySQL adds a hidden clustered index to that table. If you do not define a primary key, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.

If the table has no primary key or suitable UNIQUE index, InnoDB internally generates a clustered index GEN_CLUST_INDEX on a synthetic column containing row ID values.

https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html

score 5 · Answer 5 · answered Mar 25 '10 at 12:57

You don't need a PK, but it's recommended that you have one. It's the best way to identify unique rows. Sometimes you don't want an auto incremental int PK, but rather create the PK on something else. For example in your case, if there's only one unique row per time, you should create the PK on the time. It makes looks up based on time faster, plus it ensures that they're unique (you can be sure that the data integrity isn't violated):

score 1 · Answer 6 · answered Mar 25 '10 at 12:57

1

The time would then become your primary key. It will help index that column so that you can query data based on say a date range. The PK is what ultimately makes your row unique, so in your example, the datetime is the PK.

answered Mar 25 '10 at 12:57

JonH

32,732
12
87
145

score 1 · Answer 7 · answered Mar 25 '10 at 13:12

1

I would include a surrogate/auto-increment key, especially if there is any possibility of duplicate time/temperature readings. You would have no other way to uniquely identify a duplicate row.

answered Mar 25 '10 at 13:12

Steve

53,375
33
96
141

score 0 · Answer 8 · answered Jul 03 '14 at 11:37

I run into the same question on one of the tables i did.

The problem was that the PK was supposed to be composed out of all the rows of the table all is well but this means that the table size will grow very fast with each row inserted.

I choose to not have a PK, but only have an index on the row i do the lookup on.

score 0 · Answer 9 · answered Sep 11 '14 at 05:46

When you replicate a database on mysql, A table without a primary key may cause delay in the replication.

http://lists.mysql.com/mysql/227217

The most common mistake when using ROW or MIXED is the failure to verify that every table you want to replicate has a PRIMARY KEY on it. This is a mistake because when a ROW event (such as the one documented above) is sent to the slave and neither the master's copy nor the slave's copy of the table has a PRIMARY KEY on the table, there is no way to easily identify which unique row you want replication to change.

score 0 · Answer 10 · answered Mar 25 '18 at 14:28

According to your answer I would consider three options:

put a PK on both cols, this way for each time there could be only one temp and vise versa. This solution allows for multiple rows with the same temp or the same time just that there wouldn't be any two rows with same temp AND time.
don't put a PK at all but do put a unique index on both cols. one unique index containing both cols. this would allow for nulls in temp and time but incurs more space to maintain index.

these two options would be best for retrieval speed if you have heavy reads but would result in lower inserts rate as indices would have to be updated as well.

don't put any index at all, nor PK. this would be best for inserts but very bad for searching. useful for logging where retrieval is done by another mechanism or when inserting device is not required to check for dups.

Also, it is very important to consider cardinality here and think about future consequences of using an auto incremented number. if you're planning to do A LOT OF inserts then even an auto incremented unsigned bigint would be a risk because it would eventually run out. In your example I guess you'll be saving data daily - for how long? this would be problematic if you saved temp every minute... so I'll take this as an extreme example.

I guess it is best to think about what you need from the table. are you doing "save-and-forget" for the entire year for the temp at every minute? are you going to use this table frequently in real-time decision making in your business logic? I think it is best to segregate data necessary for real-time (oltp) from long-term saving data that would be required seldom and its retrieval latency is allowed to be high (olap). it's even worth duplicating the data into two different tables, one heavily indexed and get erased once in a while to control cardinality and the second is actually saved on a magentic disk with almost no indices at all (it is possible to transfer a schema from your main fs into another fs).

score 0 · Answer 11 · answered Aug 25 '23 at 06:58

I found a case where having no primary key makes sense. Think about watch time on video streaming platforms like YouTube, Netflix, etc. If you like to track the watch time of a user watching a specific video, I would create the following table:

CREATE TABLE `video_view` (
  `user_id` INT NOT NULL,
  `video_id` INT NOT NULL,
  `datetime` DATETIME NOT NULL,
  `video_time` FLOAT NOT NULL,
  CONSTRAINT fk_user_id,
  FOREIGN KEY (`user_id`) REFERENCES user(`id`)
  ON DELETE CASCADE,
  CONSTRAINT fk_video_ud,
  FOREIGN KEY (`video_id`) REFERENCES video(`id`)
  ON DELETE CASCADE
);

Now think of the situation a user is watching the same video at the same time in two windows or on two different machines. It is very unlikely that he will watch the same part of the video at the exact same time on two different devices. But it is possible. Therefore I chose not to create a primary key of user_id + video_id + date time (+ video_time).

There would be more possible solutions:

Add an artificial primary key (I don't want that, it gives me no advantage in this case)
Prevent from saving a data set with the same parameters (I don't want that, because that way I could not track that a user is watching the same video at the same time on two different devices)
Add an identifier for a source (tab / device / etc.)
Add another attribute count, which could be increased in case the very unlikely case happens

Other answers have already mentioned that in case of a logging table it could make sense not to have a primary key. I would define my table as a logging table, but it will be used for example for a view count on a video.

score -5 · Answer 12 · answered Mar 25 '10 at 13:01

-5

I've got a better example of a table that doesn't need a primary key - a joiner table. Say I have a table with something called "capabilities", and another table with something called "groups", and I want a joiner table that tells me all the capabilities that all the groups might have, so it's basicallly

create table capability_group
(  capability_id varchar(32),
    group_id     varchar(32));

There is no reason to have a primary key on that, because you never address a single row - you either want all the capabilities for a given group, or all the groups for a given capabilty. It would be better to have a unique constraint on (capabilty_id,group_id), and separate indexes on both fields.

answered Mar 25 '10 at 13:01

Paul Tomblin

179,021
58
319
408

But by giving it a unique constraint you are acknowledging it is a Candidate Key so why not just make it the PK? Implementation wise it would use an index to enforce the constraint anyway. – Martin Smith Mar 25 '10 at 13:07
6

In this case, `(capability_id, group_id)` should make a composite `PRIMARY KEY`. This will also make this table more efficient for the engines with clustered storage (`InnoDB`, `SQL Server` (clustered PK), `Oracle` (`ORGANIZATION INDEX`)) – Quassnoi Mar 25 '10 at 13:12

Can a database table be without a primary key?

12 Answers12

Linked