4

I am migrating an old database (oracle) and there are few tables like CountryCode, DeptCode and RoleCodes, their primary key is string (Codes) and i am thinking about adding Number column as a primary key because it would work fast with joins. These tables are not really big.

I am wondering if primary key for those tables should start from number '1' or it can be started from 100 just to differentiate b/w tables PK although i don't think i would be showing them on reports.

user1263981
  • 2,953
  • 8
  • 57
  • 98
  • You can normally set the initial seed in the table designer (in MSSQL server) - this is completely fine to do. – Dave Bish Jul 19 '12 at 11:04
  • 3
    It doesn't really matter what is the value of the primary key. As long as a *single* table doesn't contain duplicate keys. Relationship between two primary keys of two tables shouldn't bother you. You should only think about primary key of one table and a foreign key of another table (these must match). – oleksii Jul 19 '12 at 11:08

9 Answers9

3

You shouldn't do it to differentiate between tables. That is just not practical.

Not all primary keys have to start at 1, as in the case of an order number.

Daniel A. White
  • 187,200
  • 47
  • 362
  • 445
3

For sequence-generated IDs, I would suggest starting at different values if it's easy to do (depends on your database etc). You shouldn't be using this to differentiate between them in code, but it can make testing more reasonable.

Before now, I've had a situation where I've accidentally used a foreign key one table as if it were the foreign key for another table. The tests passed as the IDs were coincidentally the same. After we discovered the problem, we changed the initial seed and found the tests were a lot clearer.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
2

The rationale you're using to switch to an integer primary key doesn't seem valid: the performance gain you'd see using an INT rather than the original codes (which I assume are strings) will be negligable. The PK is always indexed, and indexes for strings or numerics are as good as instant. So unless you really need an INT, I'd be tempted to stick with the original data-type and work with the original data - simplifies data migration (which is something that should be considered whilst doing any work).

Chris J
  • 30,688
  • 6
  • 69
  • 111
  • thanks, thats what i was just thinking about. Also, is this okay to leave the EmpCode(Varchar2) as a PK? There are about 600 rows in this table referencing to timesheet and employee history table. – user1263981 Jul 19 '12 at 11:35
  • I don't see why not. When coming up with reference data tables, the choice of index is more important than whether your coded field is a string or number. If ints are quicker, it's simply because they're fixed 4-byte values whereas strings maybe random in length (up to the maximum size of the string field). So if your coded field is (say) VARCHAR2(10), then there'd be no problem. I'd worry though if the string size was significantly longer though as then it's not really a code ("code" implies a short string). – Chris J Jul 19 '12 at 12:06
  • The advantage, though, of having a hidden primary key such as an int or guid is that you can later change the value of the code column without breaking all your existing data that has foreign keys pointing at that code. – Steven Doggart Jul 19 '12 at 12:23
  • EmpCode length is Varchar2(20) and its like 'UK100', 'UK102', 'FR101' AND etc. After fisrt two characters i would have to create a sequance or some function to get the new value. – user1263981 Jul 19 '12 at 12:27
  • @SteveDog - I guess it depends why you've got a code column. To me if you've got a code, this is a short form of something else, so you may well have a list of country codes based on ISO3166 which maps codes to full names. It's very unlikely a code would change. If you decided to stick a unique number and PK that (giving a surrogate key), then you'd still also need a unique index on code (effectively giving one a natural key) to prevent duplicate codes, so you gain very little, if anything. I guess this comes down to the data: when you know what the data is, then you can decide on the best PK. – Chris J Jul 19 '12 at 14:56
  • @user1263981: I've indexed VARCHAR(20), although this is at the limit of what I'd define as sane. However, if all your codes take the form AANNN then why have a varchar(20)? Could this be a varchar(10) or even char(5) - you know your data though, but VARCHAR(20) just looks too long for the data you've provided. However it sounds like if you need to create a composite code of AANNN then I'd split this into two: a simple 'Prefix' field of CHAR(2) and an identity field of INT, then turn EmpCode into a computed field that concats the two and stick an index on there that. – Chris J Jul 19 '12 at 15:05
  • VARCHAR(20) was defined in the current system and i will defiantly reduce it. I like you idea of having three columns (Prefix,INT,Code) but in the current system all the emp codes are related to countries for an example UK employee code starts with UK001 and french employee code starts with FR001 and so on. Also i need to keep the employee history in the History table (slow dimension change) how would that fit into your option? I will also be migrating all the employee codes into new system becuase it reffers into Timesheet table which very big. – user1263981 Jul 19 '12 at 15:24
  • It sounds like there's something more going on: you're migrating from Oracle to what database - another Oracle? SQL Server? You're also stating that your codes need to restart for each country (i.e., UK001, UK002, UK003 [...] but also FR001, FR002, FR003) - correct? I think you need to state the background in your question so we've got a better picture of what you're (ultimately) trying to achieve, rather than get help piecemeal as this is now deviating wildly from your original question...! – Chris J Jul 19 '12 at 15:33
  • I am rewriting the timesheet application including redesigning database, we only use oracle so migration will be on same database (oracle to oracle). Current employee codes are in Alphanumeric forms i.e. ‘UK001’, ‘UK002’, ‘FR001’,’FR002’ and this employee code field is being used as a primary key. Employee table is also linked to timesheet and other tables where the EmpCode is being referred as FK. I was thinking about adding new INT column in the new database and keep it as PK but now as you have also mentioned it is better to keep the EmpCode as a PK and reduce the length of datatype. – user1263981 Jul 19 '12 at 15:46
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/14143/discussion-between-user1263981-and-chris-j) – user1263981 Jul 19 '12 at 15:49
  • As this discussion is not related to this post, i have moved it to http://stackoverflow.com/questions/11578380/is-this-okay-to-have-a-alphanumeric-field-as-a-primarykey – user1263981 Jul 20 '12 at 11:40
1

It is very common for example in ERP systems to define number ranges that represent a certain group of items.

This can be both as position in a bigger number, e.g.

1234567890
   | |
   index 4 - 6 represents region code
   index 7 - 8 represents dept code... 

or, as I suspect in your case, parts at the same place, like

1000 - 1999 Region codes
2000 - 2999 DeptCode
3000 - 3999 RoleCode

Therefore: No, it not necessarily starts with 1.

Bigger ERP Systems have even configuration sections for number ranges!

Now, from a database point of view:

Yes, your tables should always have a primary key! Having one will tremendously improve performance on average cases. (but in most database systems, if you do not provide one, one will be set by the DBMS which you do not see and can not handle. Some DBMS even create indices, but thats another story)

Mare Infinitus
  • 8,024
  • 8
  • 64
  • 113
1

I think it does not matter the start number or the start value that will hold the primary key .
What is important is that they will be represented in the FK of the join tables with the same values that are in the PK of the MAIN table .

Up_One
  • 5,213
  • 3
  • 33
  • 65
1

A surrogate key can have any values, as long as they are unique. That's what makes it "surrogate" after all - values have no intrinsic meaning on their own, and shouldn't generally even be shown to the user. That being said, you could think about using different seeds, just for testing purposes, as Jon Skeet suggested.

That being said, do you really need to introduce a new (surrogate) key? The existing natural key could actually lead to less1 JOINS, and may be useful for clustering. While there are legitimate uses for surrogate keys, don't do it just becaus it is "fashionable" - always be aware of the tradeoffs you are making and pick the right balance for you concrete needs.


1 It is automatically "propagated" down foreign keys, so you don't need to JOIN the child table to the parent just to get the natural key - natural key is already in the child.

Community
  • 1
  • 1
Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • +1, but what makes a surrogate a surrogate is that it takes the place of a natural key. (*Surrogate* means "substitutes for" or "takes the place of" something.) – Mike Sherrill 'Cat Recall' Jul 19 '12 at 14:20
  • @Catcall I can't really argue English, as it is not my native language. But in databases, a surrogate is not just any replacement, it's a replacement of a different _quality_. Alternate key could just as easily take place of the primary key, yet we don't (necessarily) call it "surrogate". We only call it surrogate if its values have a certain quality (lack of meaning). – Branko Dimitrijevic Jul 20 '12 at 00:06
0

Doesn't matter what int the primary key starts from. Assuming the codes aren't updated regularly, I don't believe that int will be any faster. It more heavily depends on it being a varchar or of a known size.

M Afifi
  • 4,645
  • 2
  • 28
  • 48
0

I personally always have an field names "Id" as a primary key to a table, defined as an int or a bigInt if necessary.

If the table matches up to an enumerated type then I make sure the Id matches the EnumeratedType id which can be any number - so no it doesn't need to start from 1.

If it doesn't match an enumerated type, then I will usually use an auto-incrementing key starting from 1 but this is not always needed.

Note - that if the number of rows is small, then the difference between indexing on a number and on a varchar will be negligible.

BonyT
  • 10,750
  • 5
  • 31
  • 52
  • But the table where the codes are being used is quite big so so you think JOINS would run faster on Primary Key(Varchar2 datatypes) – user1263981 Jul 19 '12 at 11:12
  • I don't know what you mean by "big", but yes - using an integer instead of a varchar for joins WILL be faster - the larger the table, the greater the performance difference. Personally I try never to join on an ID. – BonyT Jul 19 '12 at 12:48
0

yes, it does'nt matter what integer it start from, it main use is define row uniquely and relationship among other table.

Samar
  • 209
  • 3
  • 9