SQL Server or Python adding a column to keep new insert as 0 meanwhile make existing table auto increment

Question

I've been searching many similar posts here but they seem not to meet my requirement to use SQL Server IDENTITY.

I have a dataframe extracted from another data source which might include duplicate Name.

Dataframe

Date	Time	Name	Result
3/28/2022	7:00:00	abc	0.23
3/28/2022	7:00:00	abc	0.19
3/28/2022	7:05:00	def	0.36
3/28/2022	7:10:00	ghi	0.29

When I load to SQL Server, I need to create a Retest_Count column in order to indicate the test order. If the Name is duplicated and same timestamp, always set the newer one to be 0 and the older one to be 1 based on the 'original data order'. Like this:

SQL Table 1

Date	Time	Name	Result	Retest_Count
3/28/2022	7:00:00	abc	0.23	1
3/28/2022	7:00:00	abc	0.19	0
3/28/2022	7:05:00	def	0.36	0
3/28/2022	7:10:00	ghi	0.29	0

When I insert new data with same Name but different timestamp, always give 0 to the new data and the existing table need to get auto increment like this:

SQL Table 2

Date	Time	Name	Result	Retest_Count
3/28/2022	7:00:00	abc	0.23	2
3/28/2022	7:00:00	abc	0.19	1
3/29/2022	13:00:00	abc	0.18	0
3/28/2022	7:05:00	def	0.36	0
3/28/2022	7:10:00	ghi	0.29	1
3/29/2022	21:05:00	ghi	0.25	0

Is this possible to achieve by SQL Server script? Python solution is considered also.

This would seem better as a column in a `VIEW`, rather than a persisted value. You would still need an `IDENTITY` on your table, however, to denote the order of the data (as i assume it's coincidence that `Result` is in ascending order). — Thom A, Mar 28 '22 at 16:10
The purpose for this additional column is when I set Retest_Count = '0', I can always refer to the most updated data and filter out the old one. — Kelvin Lo, Mar 28 '22 at 16:14
You don't need a column like `Retest_Count` to achieve that, you could just use a [Get top 1 row of each group](https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group) (which, again, you could use a `VIEW` for easy use). This smells like an [XY Problem](//xyproblem.info). — Thom A, Mar 28 '22 at 16:16
Thank you! "Get top 1 row of each group" seems like a solution but if a duplicate data with same timestamp in the table, perhaps it couldn't identify. I'll still try it out — Kelvin Lo, Mar 28 '22 at 16:24

score 1 · Accepted Answer · answered Mar 28 '22 at 16:23

As I mentioned in the comments, this is far better suited for a VIEW. You will need to ensure you have some kind of column in your table that always increases (such as an IDENTITY) to achieve this though (as I note you have 2 tests that occur at the same time for the same Name, so there isn't a way to denote which is "first").

Then you can create a VIEW like this:

CREATE VIEW dbo.YourView_RetestCounts AS
    SELECT [Date],
           [Time],
           [Name],
           Result,
           ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY IdentityColumn DESC) - 1 AS Retest_Count
    FROM dbo.YourTable;

If, however, your goal is to filter to the latest row, as your comments suggest, then use a VIEW with a Get top 1 row of each group solution:

CREATE VIEW dbo.YourView_LatestTest AS
    WITH CTE AS(
        SELECT [Date],
               [Time],
               [Name],
               Result,
               ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY IdentityColumn DESC) AS Retest_Count
        FROM dbo.YourTable)
    SELECT [Date],
           [Time],
           [Name],
           Result
    FROM CTE
    WHERE Retest_Count = 1;

When I used `ORDER BY [Date], [Time] DESC`, neither one gives me the reverse order. The latest record always has larger Retest_Count. Do you know why? — Kelvin Lo, Mar 29 '22 at 07:09

SQL Server or Python adding a column to keep new insert as 0 meanwhile make existing table auto increment

1 Answers1