0

I've read quite a few articles and SO posts on NEWID() when used for Unique ID's. I'm trying to find out if the function is good for my records when creating a composite primary key.

One's that caught my eye were the ones that manipulate NEWID() in some way to generate an Unique ID. However, the question comes to mine. When you manipulate NEWID() in the example below, is the ID still following the same principles of being unique or just random?

RIGHT('000000000' + CAST(ABS(CHECKSUM(NEWID())) % 999999999 AS VARCHAR(9)), 9)

The purpose of this is to combine the results of the above with a Customer ID in a table with duplicate Customer ID's and records that have no Customer ID's (i.e.: 0 as there ID). For example, the above code in action with the Customer ID:

New ID - 49302304954983 (Customer with 49302 ID)
New ID - 0480384348 (Customer with 0 ID)

That way I can quickly identify customers with a original ID and those with no ID. Then I can use the new composite key as a primary key that is unique to the record.

Thanks for your assistance. I still have a great deal to learn about proper database development and I'm trying to keep to good practices that are both simple and sound. Please don't hesitate to give your input on this approach to assigning a new primary key. I welcome all feedback good or bad on top of an answer to my question if possible. :)

Fastidious
  • 1,249
  • 4
  • 25
  • 43

2 Answers2

3

The NEWID() will be unique, but the result will not be. The NEWID() is 36 characters. This is using the checksum, which is a value up to about 4 billion and then converting it back to a sequence of numbers.

There are way more NEWID() values than possible values for this expression. By something called the pigeonhole principle, there will be duplicated values.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Great explanation! This makes a lot of sense when reviewing over the Pigeon-Hole Principal. The only question is what would be better instead of NEWID() if using NEWID() is bad for primary keys? Or am I thinking of bad for Clustered indexing? – Fastidious Jan 16 '14 at 03:54
  • @Fastidious . . . Is there a reason you don't want to use an auto-incrementing `identity` column for the primary key? – Gordon Linoff Jan 16 '14 at 03:56
  • I can, but I also have to group related records together with an ID. I guess in theory it's not unique to the table, but it's unique to the group of records. Then I'm selecting those distinctly to a new table where they are unique to that table. So, the individual record can be auto-incrementing like you suggested, but the group ID I can't. – Fastidious Jan 16 '14 at 04:08
  • @Fastidious - Could you use, say, the identity value assigned to the first record in the group as the identity for the whole group? It will always be unique within the table. The group ids won't be dense, but that obviously isn't a requirement. An OUTPUT clause on the move to the new table would let you get all of the row ids and you could use that to perform any updates. It's not quite clear where in the process you are forming the groups. – HABO Jan 16 '14 at 04:44
  • Well, I'm going to use a CTE to select the Customer ID with DISTINCT. Then I'm going to have to generate a Unique ID and join back to the table to update that new Unique ID to every record of the Customer ID. Right now, I'm using NEWID() to generate that Unique ID for the group of records. I could technically take the first auto-incremented value (i.e.: 12) and use that as the Unique ID as opposed to NEWID() if that's what you mean. – Fastidious Jan 16 '14 at 05:27
  • @Fastidious . . . In a CTE, you can also use `row_number()` to assign a sequential number. – Gordon Linoff Jan 16 '14 at 12:23
  • @Fastidious - Yes, that's what I'm suggesting. You could use a transaction isolation level that ensures that only one session can write to the table and use IDENT_CURRENT() to get a value to assign to the new group before moving the rows. Alternatively, move the rows and use an OUTPUT clause to record the id's, then select one id value to assign to the group and UPDATE the rows to form the group. And GUIDs are almost unique: [ref](http://stackoverflow.com/questions/39771/is-a-guid-unique-100-of-the-time). – HABO Jan 16 '14 at 13:19
  • Yeah, but I do that when ordering by date and timestamp to determine it's placement. A good question is how you can group by ID, assign a unique ID to that group and have it partition by a third value within that group. I wish I could do a NEWID() OVER(PARTITION BY Customer ID, Action) then the Unique ID is assigned to a group of records, but groups a subgroup if that action exists! – Fastidious Jan 16 '14 at 13:23
0

You have CHECKSUM(NEWID())) in your expression.

NewID() by itself is guaranteed to be unique. However, once you take a CheckSum of it, you can no longer guarantee uniqueness.

The Checksum is a hash function. Hash functions have a low probability of collisions, but do not guarantee uniqueness.

Aheho
  • 12,622
  • 13
  • 54
  • 83