Updating a primary key with another column of scrambled unique values

Question

Disclaimer: this change is not generally a useful thing to do to a properly normalized database, but I have business reasons for it.

I have a table of data with a primary key of numeric values. This key is used as a foreign key reference in multiple other tables. There is also a column of numeric values that can be updated to reflect the desired order for the rows. The order and PK columns contain the same numbers, but ordering the table by either column scrambles the other one.

What I'm trying to do is to update the primary key to follow the same order as the order column, but SSMS gives me the error "Violation of PRIMARY KEY constraint 'PK_Constraint'. Cannot insert duplicate key in object 'tbl'. The duplicate key value is <value>."

My update statement looks like this:

update tbl set tbl.key = tbl.order where tbl.key <> tbl.order

I already know how to update the foreign key references in the other tables, so I just need to know how I can update the key in this situation.

What does `SELECT COUNT(*), COUNT(DISTINCT [key]), COUNT(DISTINCT [Order]) FROM tbl` return? It should all be the same number. Why would you care if an internal system-only primary key number doesn't match a external end user ordering attribute? Primary key is only for internal system use — Nick.Mc, Aug 25 '16 at 23:20
In my case, I got `64, 64, 64` as the results. Keeping the primary key in the same order as the ordering attribute will be useful for maintenance on the potentially thousands of rows of data in other tables that will be constructed using this key. I'm trying to head off future problems. — Jed Schaaf, Aug 25 '16 at 23:44
My advice from a database perspectve: a seperate ordering attribute is a good idea. A bad idea is to have two columns with the same value, and it's usually bad to attach a real world value to a primary key. I'm not sure what future problems you envisage but I suggest that you're adding to them by doing this. Sorry I don't have any immediate thoughts on your original problem except maybe try it without the `WHERE` — Nick.Mc, Aug 26 '16 at 00:23
This table defines the steps in an event. There is another table that defines the substeps. Both have an internal sequence ID and an order field. The substep table has a FK ref to the PK of the step table as part of its own PK. Then there is another table that tracks individual instances of the events, and each instance only adds rows as each step/substep is completed. If the PK for each of the tables are in order, then it is easier to see the order of the steps/substeps for each instance without requiring joins. The configuration is set once and then not changed after events start occurring. — Jed Schaaf, Aug 26 '16 at 02:10
All becomes clear with the statement "without requiring joins". In a normalised database a join is a _good_ thing not a bad thing. Perhaps creating a view might make this easier for you. Do you want to avoid joins for performance reasons or code complexity reasons? Anyway I reiterate - what you are doing is a very bad idea from a database perspective. — Nick.Mc, Aug 26 '16 at 05:53
For example.... when someone decides to add a new step, but this step has to be order=10, then you need to reshuffle your entire database (all three tables) and push up entries 11 to 100 by one step. This is precisely what a relational database is good for - simplifying all of this plumbing code. If you talk to anyone who worked on an ISAM database they will explain what an improvement a relational database is. — Nick.Mc, Aug 26 '16 at 05:55
Thank you, @Nick.McDermaid. I personally do recognize these benefits of a normalized database, and this system is fairly well normalized. The purpose for this change is to help make this easier for the maintenance people to make updates on the other data in the instance tracking table, so it's more of a complexity reason. The maintenance people have corporate "recommendations" to avoid joins whenever possible, so doing this ordering up front means fewer problems later on. The definition tables are very unlikely to change later as well, so future inserts are a lesser consideration factor. — Jed Schaaf, Aug 26 '16 at 21:56

score 1 · Answer 1 · answered Aug 25 '16 at 22:05

1

Check to make sure that there are no duplicate values in tbl.Order. If there are, you must resolve the duplicates before you can update the PK column with those values.

SELECT
    order,COUNT(order) as NumDupes
FROM tbl
GROUP BY order
HAVING COUNT(order) > 1

answered Aug 25 '16 at 22:05

squillman

13,363
3
41
60

Ah, I had forgotten to check that. – Jed Schaaf Aug 25 '16 at 22:29
Ok, I just checked, and there are no duplicates in the order column. – Jed Schaaf Aug 25 '16 at 22:30

Jed Schaaf · Accepted Answer · 2018-07-17T22:17:51.480

I eventually figured out enough of the issue that I could solve this using a cursor. I'm putting my solution here for reference. If someone wants to simplify/modify this to use set-based queries, I'll accept that answer.

Step 1

Using a query from this answer, I found that there were a few order/ID "chains" that had one end that would result in a duplicate with a simple set-based update:

with parents as 
(
  select 1 idx, ID, Order, Name from tbl where ID <> Order
  union all 
  select idx+1, p.ID, v.Order, p.Name from parents p inner join tbl v on p.Order = v.ID and idx < 100
)
select parents from (
select distinct parents from (
select *, parents = stuff
    ( ( select ', ' + cast(p.Order as varchar(100)) from parents p 
        where t.ID = p.ID for xml path('')
    ) , 1, 2, '') from parents t ) x ) y
order by len(parents) desc

Step 2

I manually looked through the result set to find the longest row that ended with a given value. I then put the values from one chain into a temp table in the order given:

create table #tmp (id int identity(1,1), val int)
insert into #tmp values <list of values>

Step 3

Next I ran through the temp table with a cursor and updated each row (and foreign key references) individually:

declare @val int
declare @old int
declare val cursor for select val from #tmp order by id desc
open val
fetch next from val into @val
while @@fetch_status = 0
begin
    set @old = (select ID from tbl where Order = @val)

    insert into tbl(ID, <other columns>)
    select @val, <other columns> from tbl where ID = @old

    update <other tables> set FK_ID = @val where FK_ID = @old

    delete from tbl where ID = @old

    fetch next from val into @val
end;close val; deallocate val;

Step 4

I repeated steps 2 and 3 for each "chain". At the end, my table had the primary key in the same order as the Order field.

Updating a primary key with another column of scrambled unique values

2 Answers2

Step 1

Step 2

Step 3

Step 4