Is the SQL Server identity function reproducible

Question

I want to add an identity column to a table that does not have an ID column so far using

Alter Table Names
    Add Id_new Int Identity(1, 1)

EDIT: My question seemed to have led to some confusion, so I try to clarif: We got a few thousand tables from a customer to do analysis. Most of them do not have and ID column. We load the tables from the client, however do not know what is inside. For further analysis we want to add our own ID column.

It might happen, that the edited tables might be accidentally dropped. When this happens, we need be able to reload the data, add again the ID column and the ID column has to be exactly the same order as it was the first time we loaded the data. Otherwise, if we refer to a the ID in a downstream analysis, we would refer to a different row than before.

So: Is it the case, that the ID numbers added to a table are always the same order, given that the data in the table is the same?

Remarks: With my colleagues we discussed the possibility that we could use an order by clause over all columns. That would create what we want. However, we have thousands of tables, so this would be quite a pain. The data is a dump that we want to analyze for a client. So the question is, if there is an easier, reproducible way to add and ID to all rows in all tables.

Huh? I don't understand what you are asking here. What command are you talking about? There is no such thing as an identity function. — Sean Lange, Mar 23 '18 at 15:20
In SQL Server you don't have any control over which row gets which identity value. https://stackoverflow.com/questions/9414826/sql-server-how-to-add-new-identity-column-and-populate-column-with-ids — , Mar 23 '18 at 15:21
@SeanLange I thinks he’s asking how to control the order and value of the ids assigned. — RBarryYoung, Mar 23 '18 at 15:22
@RBarryYoung Maybe, but the wording of the question is really unclear and it could mean at least a few other things as well. — Tab Alleman, Mar 23 '18 at 15:23
The way to do this is to add an identity column, the use `SET IDENTITY_INSERT` to overwrite the ID column with the explicit values you want each row to have. — RBarryYoung, Mar 23 '18 at 15:25
In SQL tables have no *inherent* ordering (that is "user visible", at least). The only time you can get any ordering that you can *depend* upon is when you're allowed to specify an `ORDER BY` clause. — Damien_The_Unbeliever, Mar 23 '18 at 15:25
Can the stored order of a table change in time? Physical order - absolutely, Logical Order - if its clustered, only when data changes, if its a heap, then there is no notion of order. — Andrew, Mar 23 '18 at 15:26
I can't work out if the OP is asking if they want to control the order, or asking if, if they were to add a second `IDENTITY` column would they have the same value (which is a no, as you can only have 1 `IDENTITY` column). if the former, then no, but you could update the values after. *(To be honest, the OP has enough rep that they should know how to ask a meaningful question).* — Thom A, Mar 23 '18 at 15:26
@TabAlleman Agreed the wording is confusing, but that’s how I read it intent. — RBarryYoung, Mar 23 '18 at 15:26
Is what you're asking similar to: "I have two tables with identical data. If I add an IDENTITY column to each table, can I guarantee that the same rows have the same IDENTITY value in each table"? — pmbAustin, Mar 23 '18 at 15:32
No. If you drop a table of data and recreate it there is no way to know what the order of your identity would be. Let me turn the tables a little bit here. What would you expect? A table by definition is an unordered set. — Sean Lange, Mar 23 '18 at 15:35
@pmbAustin : Yes that is exactly what I want, and we need to do this for thousands of tables that we dont know the content of. — aldorado, Mar 23 '18 at 15:35
It is easy to add the identity column, but it is impossible to be certain that the same row would always get the same identity value when you drop the table and recreate it. — Sean Lange, Mar 23 '18 at 15:36
@aldorado update the question and explain what you want then. It can be done but not with IDENTITY. There's no "same rows" in databases btw — Panagiotis Kanavos, Mar 23 '18 at 15:36
If the two tables have the same primary key, then I think the answer is probably yes, otherwise probably no. The thing is, this is not documented to be deterministic, so if you get the result you want (identical identity values for all matching rows across tables), it's more by luck or implementation detail than by design. It doesn't seem like something you can or should rely on. — pmbAustin, Mar 23 '18 at 15:36
How can you possibly not know the content of the tables, but know that they are identical? Your edit has only made the question more confusing to me. — Tab Alleman, Mar 23 '18 at 15:42
Agree with @TabAlleman, this edit just made it even more confusing. Are you asking "*If I have to reload a table's data with mostly old data, but some new data, how can I insure that the same ID's get assigned to the same rows?*" If not, the please try to clarify again. — RBarryYoung, Mar 23 '18 at 16:02
@TabAlleman I tried to clarify again, please let me know if this explains the issue — aldorado, Mar 26 '18 at 07:24

paparazzo · Answer 1 · 2018-03-24T09:57:45.893

The question is why are you re-adding the Add Id_new Int Identity(1, 1)?

A table is not ordered. That command is not guaranteed to repeat.

Your data needs to supply its own PK if you want it to repeat.

If you create the identity first and then insert the identity will be in the insert order but it could have gaps.

If all rows produce a unique sort then this should work

select col1, col2, col3 
     , row_number() over (order by col1, col2, col3) as rn 
 from table;

If you are reloading tables then you can just truncate.

score 1 · Answer 2 · answered Mar 23 '18 at 15:47

IDENTITY generates incrementing numbers for specific tables. You can't ensure that rows added to different tables will get the same numbers. That's not what IDENTITY is for anyway.

SQL Server can generate sequence numbers for such cases. You can use a SEQUENCE instead to generate incrementing numbers that can be used in multiple tables, or even to ensure unique numbers across tables.

How you use sequences depends on how you insert the data. The simplest way would be to get a new value and use it to insert rows in multiple tables :

SET @myID = NEXT VALUE FOR Test.CountBy1 ; 
INSERT INTO Table1 (ID,....) VALUES(@myID,....);

That's not very helpful in bulk import scenarios though.

Another option is to reserve an entire range of numbers using the sp_sequence_get_range stored procedure. You can ask for eg 10 numbers and the function will return the starting number and ensure the next 10 numbers won't be generated for anyone else :

DECLARE @range_first_value sql_variant ,   
        @range_first_value_output sql_variant ;  

EXEC sp_sequence_get_range  
@sequence_name = N'RangeSeq'  
, @range_size = 10  
, @range_first_value = @range_first_value_output OUTPUT ;  

SELECT @range_first_value_output AS FirstNumber ;

This can be used in ETL scripts or code to generate incrementing numbers and assign them to rows.

Generating unique numbers across tables is a lot easier. Just use NEXT VALUE FOR ... as the default value of the ID column

score 0 · Answer 3 · answered Mar 23 '18 at 15:57

It is a complex task that could be solved in a many ways. I will try to explain how would I solve it.

Step 1: Generate scripts for all tables: how do i generate scripts for all tables with single stroke in sql server 2000

Step 2: Add Identity column to all the tables. To do this we need to edit the script generated in Step 1 and replace all instances of ) ON [PRIMARY] with , IdCol int IDENTITY(1,1)) ON [PRIMARY]

Step 3: We need to alter table name so that it is different from the original one. Replace all ].[ with ].[temp1234_

Step 4: Run the script. This will create empty copies of all the tables with the IdCol as identity column.

Step 5: Generate INSERT scripts for all the tables. Run the following query:

select 'INSERT INTO ['+name+'] SELECT * from ['+REPLACE(name, 'temp1234_', '')+'] order by column_name' 
from sys.all_objects where name like 'temp1234_%'

This will give you a list of rows with the INSERT script for each table. Replace column_name with the name of the column that could be used as order by clause. If the names of the columns vary by table you would have to come up with the way of specifying the correct column name for each table.

Step 6: Copy all the rows, run, wait to complete, check the results.

As the others have already stated - there is no control over the values Identity assigns to each row, so to have the control you would need to re-insert the data into new tables.

IF your table already have the column that you would like to use as identity, without having to re-assign values, there might be the way to do it. Let me know if it is the case.

Is the SQL Server identity function reproducible

3 Answers3