Changing a table's primary key column referenced by foreign key in other tables

Question

In our DB (on SQL Server 2005) we have a "Customers" table, whose primary key is Client Code, a surrogate, bigint IDENTITY(1,1) key; the table is referenced by a number of other tables in our DB thru a foreign key.

A new CR implementation we are estimating would require us to change ID column type to varchar, Client Code generation algorithm being shifted from a simple numeric progression to a strict 2-char representation, with codes ranging from 01 to 99, then progressing like this:

1A -> 2A -> ... -> 9A -> 1B -> ... 9Z

I'm fairly new to database design, but I smell some serious problems here. First of all, what about this client code generation algorithm? What if I need a Client Code to go beyond 9Z code limit?

The I have some question: would this change be feasible, the table being already filled with a fair amount of data, and referenced by multiple entities? If so, how would you approach this problem, and how would you implement Client Code generation?

why dont you add another column for the varchar ID and keep the old one as is?? — Furqan Hameedi, May 30 '11 at 08:06
I've added some more infos on my answer, if you're interested. Don't forget to close the answer if it was good for you ;-) — LeftyX, May 30 '11 at 08:14
You're right. The new CR implementation is JURASSIC. Tell those dinosaurs to go crawl into a tarpit and go extinct already. You do not want to embed any meaningful data in the primary key; its sole purpose is to uniquely identify the row. A compromise would be to add a new column with a unique index on it to store the funky dino-legacy key. — Tim, May 30 '11 at 13:30
@Tim Unfortunately, dinosaurs seems to be still alive and well... :-) — Andrea Pigazzini, May 30 '11 at 14:06

score 2 · Accepted Answer · edited May 23 '17 at 12:17

2

I would leave the primary key as it is and would create another key (unique) on the client code generated. I would do that anyway. It's always better to have a short number primary key instead of long char keys. In some situation you might prefer a GUID (for replication purposes) but a number int/bigint is alway preferable. You can read more here and here.

edited May 23 '17 at 12:17

Community

1
1

answered May 30 '11 at 08:05

LeftyX

35,328
21
132
193

I have evidence in sql server 2005/2008 in a large system that a 7-8 character varchar key is actually *faster* than int primary key. – lambacck May 30 '11 at 14:21
Might be. I tend to prefer PK which is unique and which will never change. Personally I use GUID (Comb). – LeftyX May 30 '11 at 14:25

lambacck · Answer 2 · 2011-05-30T14:43:47.807

My biggest concern with what you are proposing is that you will be limited to 360 primary records. That seems like a small number.

Performing the change is a multi-step operation. You need to create the new field in the core table and all its related tables.

To do an in-place update, you need to generate the code in the core table. Then you need to update all the related tables to have the code based on the old id. Then you need to add the foreign key constraint to all the related tables. Then you need to remove the old key field from all the related tables.

We only did that in our development server. When we upgraded the live databases, we created a new database for each and copied the data over using a python script that queried the old database and inserted into the new database. I now update that script for every software upgrade so the core engine stays the same, but I can specify different tables or data modifications. I get the bonus of having a complete backup of the original database if something unexpected happens when upgrading production.

One strong argument in favor of a non-identity/guid code is that you want a human readable/memorable code and you need to be able to move records between two systems.

Performance is not necessarily a concern in SQL Server 2005 and 2008. We recently went through a change where we moved from int ids everywhere to 7 or 8 character "friendly" record codes. We expected to see some kind of performance hit, but we in fact saw a performance improvement.

We also found that we needed a way to quickly generate a code. Our codes have two parts, a 3 character alpha prefix and a 4 or 5 digit suffix. Once we had a large number of codes (15000-20000) we were finding it to slow to parse the code into prefix and suffix and find the lowest unused code (it took several seconds). Because of this, we also store the prefix and the suffix separately (in the primary key table) so that we can quickly find the next available lowest code with a particular prefix. The cached prefix and suffix made the search almost fee.

We allow changing of the codes and they changed values propagate by cascade update rules on the foreign key relationship. We keep an identity key on the core code table to simplify the update of the code.

We don't use an ORM, so I don't know what specific things to be aware of with that. We also have on the order of 60,000 primary keys in our biggest instance, but have hundreds of tables related and tables with millions of related values to the code table.

One big advantage that we got was, in many cases, we did not need to do a join to perform operations. Everywhere in the software the user references things by friendly code. We don't have to do a lookup of the int ID (or a join) to perform certain operations.

score 0 · Answer 3 · answered May 30 '11 at 10:00

The new code generation algorithm isn't worth thinking about. You can write a program to generate all possible codes in just a few lines of code. Put them in a table, and you're practically done. You just need to write a function to return the smallest one not yet used. Here's a Ruby program that will give you all the possible codes.

# test.rb -- generate a peculiar sequence of two-character codes.
i = 1
('A'..'Z').each do |c|
  (1..9).each do |n|
    printf("'%d%s', %d\n", n, c, i)
    i += 1
  end
end

The program will create a CSV file that you should be able to import easily into a table. You need two columns to control the sort order. The new values don't naturally sort the way your requirements specify.

I'd be more concerned about the range than the algorithm. If you're right about the requirement, you're limited to 234 client codes. If you're wrong, and the range extends from "1A" to "ZZ", you're limited to less than a thousand.

To implement this requirement in an existing table, you need to follow a careful procedure. I'd try it several times in a test environment before trying it on a production table. (This is just a sketch. There are a lot of details.)

Create and populate a two-column table to map existing bigints to the new CHAR(2).
Create new CHAR(2) columns in all the tables that need them.
Update all the new CHAR(2) columns.
Create new NOT NULL UNIQUE or PRIMARY KEY constraints and new FOREIGN KEY constraints on the new CHAR(2) columns.
Rewrite user interface code (?) to target the new columns. (Might not be necessary if you rename the new CHAR(2) and old BIGINT columns.)
Set a target date to drop the old BIGINT columns and constraints.
And so on.

score 0 · Answer 4 · answered May 30 '11 at 13:21

Not really addressing whether this is a good idea or not, but you can change your foreign keys to cascade the updates. What will happen once you're done doing that is that when you update the primary key in the parent table, the corresponding key in the child table will be updated accordingly.

Changing a table's primary key column referenced by foreign key in other tables

4 Answers4