3

Defination of Superkey and Primary key in wikipedia

A superkey is a set of attributes within a table whose values can be used to uniquely identify a tuple.

and

The primary key has to consist of characteristics that cannot be duplicated by any other row. The primary key may consist of a single attribute or a multiple attributes in combination.

I've gone through many books and surfed on internet but what i found in them is what is primarykey and what is superkey.

But what i want to know is why superkey is required when we can identify a tuple uniquely through primarykey ?

giusti
  • 3,156
  • 3
  • 29
  • 44
Praveen Dabral
  • 2,449
  • 4
  • 32
  • 46
  • possible duplicate of [What are the differences between a superkey and a candidate key?](http://stackoverflow.com/questions/4519825/what-are-the-differences-between-a-superkey-and-a-candidate-key) – Bill Karwin Apr 19 '13 at 01:20
  • I've already mentioned that i know what primary key and superkey are, but what i want to know is the reason behind their evolution, when we already have a key that can identify tuples uniquely then why do we need the other one. – Praveen Dabral Apr 19 '13 at 06:49

5 Answers5

4

Superkeys are defined for conceptual completeness. You never need a superkey for reference purposes. A reference to a primary key will do just fine.

The concept of superkeys can be useful when you are analyzing a body of data in order to discover all the functional dependencies in it.

Once you have discovered a key, the next question is whether or not it is a superkey. If is is, you turn your attention to the candidate key contained in the superkey.

Walter Mitty
  • 18,205
  • 2
  • 28
  • 58
3

Let's define what these terms mean in the first place:

  • A "superkey" is any set of attributes that, when taken together, uniquely identify rows in the table.
  • A minimal1 superkey is called "candidate key", or just "key".
  • All keys in the same table are logically equivalent, but for historical and practical reasons we choose one of them and call it "primary", while the remaining are "alternate" keys.

So, every primary key is key, but not every key is primary. Every key is superkey, but not every superkey is key.

Constraints that physically enforce keys in the database are: PRIMARY KEY constraint (for primary key) and UNIQUE constraint (for alternate key). These constraints should not be created on all superkeys, only on keys.

It is not unusual to have multiple keys in the same table, depending on the nature of your data. For example, a USER table might have unique USER_ID and unique USER_NAME. Since both of them need to be unique on their own, you must create2 both keys, even though only one of them is strictly needed for identification.


1 That is, a superkey that would stop being unique (and therefore, being a superkey) if any of the attributes were removed from it.

2 I.e. create PRIMARY KEY or UNIQUE constraint.

Community
  • 1
  • 1
Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • From wikipedia-- The primary key has to consist of characteristics that cannot be duplicated by any other row. **The primary key may consist of a single attribute or a multiple attributes in combination.** When a primary key can have multiple attributes in combination then what is the use of superkey? – Praveen Dabral Apr 18 '13 at 19:31
  • 1
    @pd4627 Yes, and this is true for alternate key as well. The concept of superkey is mainly a mental device for data modeling - first you identify superkeys then "narrow" them down to keys (only keys are actually enforced in the database). That doesn't mean a key cannot be composite (i.e. be a combination of multiple attributes). – Branko Dimitrijevic Apr 18 '13 at 19:56
  • 1
    @pd4627 Let me expand on superkeys a bit... Imagine you have a key. **Every** superset of that key is superkey, including the whole row! As you can see, there can be a large number of superkeys, many of them partially overlapping. As such, this is not a practical concept - it's there just to classify and reason about data modeling more easily. – Branko Dimitrijevic Apr 18 '13 at 20:03
2
  • A word key is usually a short for a candidate key.
  • Superkey means a super-set of a key (key attributes and some more).
  • Irreducible superkey is called a candidate key. (Irreducible means that if you remove one attribute, it is not a key any more); in general, there is more than one candidate key for a given relation (actually a relational variable).
  • One candidate key that designer choses to prefer (for some reason) is called the primary key.

This was on a logical level, keys are defined for relational variables, so called relvars.

In physical implementation:

  • Relvar maps to a table.
  • Primary key to the primary key of the table.
  • Other candidate keys (except PK) map to alternate keys (unique not null).
Damir Sudarevic
  • 21,891
  • 3
  • 47
  • 71
  • But the answer is not justified here, as we can identify a tuple uniqely through primary key then why superkey is required? – Praveen Dabral Apr 18 '13 at 19:37
  • 1
    @pd4627 cont.. By definition, a relation does not have duplicate tuples, hence a header (set of all attributes) in a relational variable is a superkey; and from this point you start reducing (removing attributes) to obtain candidate keys. Finally you choose one you like to be the primary key. – Damir Sudarevic Apr 18 '13 at 20:02
  • 1
    @pd4627 If **A** is subset of **B** -- can there be **A** if **B** does not exists? Primary key is a subset of a superkey. Also each set is subset of itself -- so primary key is a superkey too. – Damir Sudarevic Apr 18 '13 at 21:34
2

A primary key is a superkey. Having only one such key constraint and only one way to identify tuples isn't necessarily sufficient.

Firstly, the relational model's versatility derives very much from the fact that it does not predetermine how data can or should be accessed in a table. A user or application is free to query a table based on whatever set of attributes may be necessary or convenient at the time. There is no obligation to use a "primary" key, which may or may not be relevant for some queries.

Secondly, uniqueness constraints (usually on candidate keys) are a data integrity feature. They guarantee data isn't duplicated in the key attributes. That kind of constraint is often useful on more than one set of attributes where business rules dictate that things should be unique. Uniqueness of one thing alone obviously doesn't guarantee uniqueness of another thing.

Thirdly, the query optimiser can take advantage of any and all keys as a way of optimising data access through query rewrites. From the optimiser's point of view the more keys it has to work with in a table the better.

nvogel
  • 24,981
  • 1
  • 44
  • 82
1

I think superkey is just part of the relational algebra abstraction - your primary key is (likely) to be the minimal superkey but you might have other superkeys whereas you only have one primary key.

rjp
  • 1,942
  • 12
  • 14
  • but the core function they are doing is identifying a tuple uniquely, when one key can do it efficiently then why other key is required? – Praveen Dabral Apr 18 '13 at 08:33
  • @pd4627 rjp got it right, there is one primary key which is also one superkey - there _might_ also be other superkeys (or unique entries) of which some _might_ be possible candidates for a primary key. – Argeman Apr 18 '13 at 11:04
  • but why the concept of superkey was evolved when we already had primary key to identify unique tuples? – Praveen Dabral Apr 18 '13 at 11:10
  • 1
    I suspect superkey was first and primary key is just a nice simple way of referring to the minimal superkey. Verifying that would require delving into the early days of relational calculus and a) my Big Book Of Scary Database Stuff is miles away and b) the sun isn't even up yet. – rjp Apr 19 '13 at 04:09