8

I am very new to data modeling, and according to Microsoft's Entity Framework, tables without primary keys are not allowed and apparently a bad idea. I am trying to figure out why this is a bad idea, and how to fix my model so that I do not have this hole.

I have 4 tables in my current model: User, City, HelloCity, and RateCity. It is modeled as shown in the picture. The idea is that many users can visit many cities, and a user can only rate a city once, but they can greet a city many times. For this reason, I did not have a PK in HelloCity table.

Any insight as to how I can change this to comply with best practices, and why this is against best practices to begin with?

enter image description here

KateMak
  • 1,619
  • 6
  • 26
  • 42
  • If you can add a PK do it, they are cheap and realllllly nice to have. Every table I have has an autoinc primary key. Even if I dont use it. your question is probably better suited for codereview – Nix Sep 22 '16 at 22:45
  • Just make both userid and cityid your pk. It will have the added bonus that you won't have to mess with duplicates. – Matthew Whited Sep 22 '16 at 22:48
  • But then I cannot add more than one greeting for a user....the idea is the user can add as many greetings as they wish... – KateMak Sep 22 '16 at 22:48
  • 4
    Primary keys are used internally to optimize queries, but the real question is...if you wanted to delete just one record in HelloCity, how would you do it? You have nothing to uniquely identify a row. Say there was a glitch that caused the same Greeting to be added twice; you would have two rows that contained exactly the same data and you couldn't delete just one. – Paul Abbott Sep 22 '16 at 22:50
  • Create a HelloId primary key, make it auto increment. Then that's it. You don't have to do anything with it. User can add greeting for a city as many times as they want since the HelloId is auto increment. – Eric Sep 22 '16 at 22:54
  • Was thinking either that or have the UserId, CityId, and Timestamp be the pk – KateMak Sep 22 '16 at 22:54
  • 1
    As @PaulAbbott said, PKs are mainly useful to identify a single row in case you want to delete it, but more importantly to *avoid having inconsistencies*. It's always good to specify in your database what field combination must not be duplicate, because otherwise you may end up with a real mess. Your table structure must reflect your business logic. If the compound PK involves too many fields, you may want to create an identity field (PK) and create a unique index for all those fields. – Andrew Sep 22 '16 at 22:55
  • @PaulAbbott I up voted your comment and agree with your optimization statement but for devils advocate ;WITH cte AS (SELECT *, rownum = ROW_NUMBER() OVER (PARTITION BY Col1,Col2,Col3 ORDER BY (SELECT 0)) FROM TableName) DELETE FROM cte WHERE rownum > 1. Yep just deleted the duplicate without a unique id. – Matt Sep 22 '16 at 23:00
  • Does this answer your question? [Should each and every table have a primary key?](https://stackoverflow.com/questions/840162/should-each-and-every-table-have-a-primary-key) – philipxy May 06 '23 at 22:12

4 Answers4

16

This response is mainly opinion/experience-based, so I'll list a few reasons that come to mind. Note that this is not exhaustive.

Here're some reasons why you should use primary keys (PKs):

  1. They allow you to have a way to uniquely identify a given row in a table to ensure that there're no duplicates.
  2. The RDBMS enforces this constraint for you, so you don't have to write additional code to check for duplicates before inserting, avoiding a full table scan, which implies better performance here.
  3. PKs allow you to create foreign keys (FKs) to create relations between tables in a way that the RDBMS is "aware" of them. Without PKs/FKs, the relationship only exists inside the programmer's mind, and the referenced table might have a row with its "PK" deleted, and the other table with the "FK" still thinks the "PK" exists. This is bad, which leads to the next point.
  4. It allows the RDBMS to enforce integrity constraints. Is TableA.id referenced by TableB.table_a_id? If TableB.table_a_id = 5 then, you're guaranteed to have a row with id = 5 in TableA. Data integrity and consistency is maintained, and that is good.
  5. It allows the RDBMS to perform faster searches b/c PK fields are indexed, which means that a table doesn't need to have all of its rows checked when searching for something (e.g. a binary search on a tree structure).

In my opinion, not having a PK might be legal (i.e. the RDBMS will let you), but it's not moral (i.e. you shouldn't do it). I think you'd need to have extraordinarily good/powerful reasons to argue for not using a PK in your DB tables (and I'd still find them debatable), but based on your current level of experience (i.e. you say you're "new to data modeling"), I'd say it's not yet enough to attempt justifying a lack of PKs.

There're more reasons, but I hope this gives you enough to work through it.

As far as your M:M relations go, you need to create a new table, called an associative table, and a composite PK in it, that PK being a combination of the 2 PKs of the other 2 tables.

In other words, if there's a M:M relation between tables A and B, then we create a table C that has a 1:M relation to with both tables A and B. "Graphically", it'd look similar to:

+---+ 1  M +---+ M  1 +---+
| A |------| C |------| B |
+---+      +---+      +---+

With the C table PK somewhat like this:

+-----+
|  C  |
+-----+
| id  |  <-- C.id = A.id + B.id (i.e. combined/concatenated, not addition!)
+-----+

or like this:

+-------+
|   C   |
+-------+
| a_id  |  <--|
+-------+     +-- composite PK columns instead
| b_id  |  <--|   of concatenation (recommended)
+-------+
code_dredd
  • 5,915
  • 1
  • 25
  • 53
  • Hi, I was reading your answer, and I had a question. Why is it that some RDBMS dont allow to use primary key? For example, Azure synapse analytics doesnt allow to use primary key in enforced manner. Here is the [docs](https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-table-constraints) – Arun T Mar 05 '22 at 05:41
  • 1
    I'm convinced that I've come across a legitimate and common use case for a no-PK table: If a table "owns" an unordered list of items which are deleted/replaced when the owner row is updated, not having an ID discourages other developers and future you from trying to make a FK onto the list items. This makes queries and client-side management significantly easier than tracking PKs and doing inserts and updates explicitly on each list item, and I can't think of any relational problems it may cause. – Rei Miyasaka Mar 18 '22 at 20:03
  • hi this is kinda old but I wonder what about unique key ? we can use foreign key to refer to unique key and (maybe) having the same effect as pkey... as a result we don't need pkey (but more memory for non clustred index I guess) – aanhlle Apr 04 '22 at 03:34
  • @aanhlle A primary key *is* a unique key. If what you meant to say was to use a unique *index* as if it were a primary *key*, then no, they're not the same at all. You *cannot* use a table's *index* as a foreign *key* in another table. So, yes, you *do* need a pkey. – code_dredd Apr 06 '22 at 00:33
  • @code_dredd thanks, I'm a bit confused according to this question in the link below. If I read it correctly then "we can have a foreign key that references a unique index in another table.", which might contradict to what you stated.. https://stackoverflow.com/questions/2895219/can-we-have-a-foreign-key-which-is-not-a-primary-key-in-any-other-table – aanhlle Apr 06 '22 at 10:06
  • @aanhlle Yes, but a comment under the accepted answer notes the following: "This is correct in Microsoft SQL Server but it isn't standard SQL and isn't supported by all other DBMSs. In ISO standard SQL there is no such thing as an index and FOREIGN KEY constraints are always required to match UNIQUE or PRIMARY KEY constraints." So, basically, for reasons that, IMO, defy logic, Microsoft allows developers to do non-standard things that, even if allowed, are actually bad ideas and bad practices overall. In short, even if you *can*, you *shouldn't*. I hope that helps. – code_dredd Apr 06 '22 at 11:26
3

Two primary reasons for a primary key:

  1. To uniquely identify a record for later reference.
  2. To join to other tables accurately and efficiently.
Andrew
  • 7,602
  • 2
  • 34
  • 42
M T Head
  • 1,085
  • 9
  • 13
3

A primary key essentially tags a row with a unique identifier. This can be composed of one or more columns in a row but most commonly just uses one. Part of what makes this useful is when you have additional tables (such as the one in your scenario) you can refer to this value in other tables. Since it's unique, I can look at a column with that unique ID in another table (say HelloCity) and instantly know where to look in the User table to get more information about the person that column refers to.

For example, HelloCity only stores the IDs for the User and City. Why? Because it'd be silly to re-record ALL the data about the City and ALL the data about the User in another table when you already have it stored elsewhere. The beauty of it is, say the user needs to update their DisplayName for some reason. To do so, you just need to change it in User. Now, any row that refers to the user instantly returns the new DisplayName; otherwise you would have to find every record using the old DisplayName and update it accordingly, which in larger databases could take a considerable amount of time.

Note that the primary key is only unique in that specific table though - you could theoretically see the same primary key value in your City and User tables (this is especially common if you're using simple integers as IDs) but your database will know the difference based on the relationship you build between tables as well as your JOIN statements in your queries.

Another way primary keys help is they automatically have an index generated on their column(s). This increases performance in queries where your WHERE clause searches on the primary key column value. And, since you'll likely be referring to that primary key in other tables, it makes that lookup faster as well.

In your data model I see some columns that already have 'Id' in them. Without knowing your dataset I would hope those already have all-unique values so it should be fine to place a PK on those. If you get errors doing that there are likely duplicates.

Back to your question about HelloCity - Entity Framework is a little finicky when it comes to keys. If you really want to play it safe you can auto-generate a unique ID for every entry and call it good. This makes sense because it's a many-to-many relationship, meaning that any combination can appear any number of times, so in theory there's no reliable way to distinguish between unique entries. In the event you want to drop a single entry in the future, how would you know what row to refer to? You could make the argument that you search on all the fields and the greeting may be different, but if there are multiple visits to a city with the same greeting, you may accidentally drop all of those records instead of just one.

However, if it was a one-to-one relationship you could get away with making the combination of both CityId and UserId the primary key since that combination should always be unique (because you should never see multiple rows making that same combination).

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Cory Gehr
  • 105
  • 1
  • 7
0

Late to the party but I wanted to add there are special cases where a table does not need to have a primary key, or any type of key.

Take the case of a singleton, for example. A table that includes a single row (or a very well known number of rows) always. The dual table in Oracle is one case.

Formally, the primary key for a singleton is (): that is, a key with no columns. I'm not aware of any database that allows it, though.

There are other cases where a PK is not needed, typically with log tables that usually are "end tables" since you typically draw them at the border of your diagram; no other tables refer to them (i.e. they have no children). A good use of indexes is enough to deal with them since, by their nature, they don't need to enforce row uniqueness.

But, to close, yes 99.99% of tables in a relational database should have a PK.

The Impaler
  • 45,731
  • 9
  • 39
  • 76