SQL: Adding an ID Column to a Table

Question

I am working with Netezza SQL.

I want to add an ID column to my table (e.g. row 1 - id = 1, row 2 - id = 2, etc.) - each row should have a unique ID value.

Usually, this is not a problem. I just run the following code:

SELECT ROW_NUMBER() OVER(ORDER BY column_name) AS id, column1, column2, column3
FROM your_table;

However, the above code assumes that there exists some column ("column_name") with no duplicates - if this column does contain duplicates, then the ID column will contain duplicates as well.

In a previous question (SQL: Using CASE WHEN for Missing Rows), I learned that Netezza contains a built-in table with integers from 0,1,2,3...n . This table is called _v_vector_idx and contains a single column (IDX) with these integers.

My Question: I am trying to learn how I can use this _v_vector_idx table to create ID's in my table - I had thought of doing this using an INNER JOIN, but I am not sure what column I would use for this join.

Can someone please show me how to do this?

Thanks!

Note: I found an indirect way to accomplish something similar

select *,cast(random()* 10000 as integer) as random_number from your_table

But apart from these being random ID's (not in ascending order from 0 onwards) - I have no way of guaranteeing that some of these random ID's were not coincidentally created twice. I know that I can check this manually using a DISTINCT statement - but this is very time consuming and I was hoping for a more direct way.

Have you tried `ROW_NUMBER() OVER()` or `ROW_NUMBER() OVER(PARTITION BY 1)`? — Atmo, Mar 16 '23 at 16:27
Read your question again and I see a mistake in your question: `ROW_NUMBER()` does not need the column to have no duplicates. It will generate unique ids even if `column_name` has duplicates. The window functions that generate the same id for duplicate values in `column_name` are `RANK()` and `DENSE_RANK()` — Atmo, Mar 17 '23 at 00:36

Paul Maxwell · Answer 1 · 2023-03-17T03:58:09.113

This is untrue:

SELECT ROW_NUMBER() OVER(ORDER BY column_name) AS id, column1, column2, column3 FROM your_table; However, the above code assumes that there exists some column ("column_name") with no duplicates

The ORDER BY used in the OVER clause does NOT require any of the columns specified to be unique. Regardless of how non-unique the ordering is, ROW_NUMBER will not repeat an integer and every row gets a unique row number (unless you are also using PARTITION BY)

So I would contend that you simply do not need to use _v_vector_idx

ROW NUMBER Assigns a unique number to each row to which it is applied. ROW_NUMBER() over(window_spec)

https://www.ibm.com/docs/en/psfa/7.2.1?topic=reference-functions

Where a table of numbers can be useful is if looking for gaps in another table. Let's assume you have a serial number of some sort held in table1:

select numbertable.value
from numbertable
left join table1 on numbertable.value = table1.serialno
where table1.serialno IS NULL
and numbertable.value between 10000 and 99999

This would list the numbers not yet used by the table1.serialno in a given range

nb: I'm not a Netezza user so I don't know the details of _v_vector_idx

Utkarsh Tiwari · Answer 2 · 2023-03-17T02:33:39.093

1

Actually if you wanted to find duplicates with the row number () you can either use dense_rank() or partition by .

For Ex:-

select row_number() over (partition by Column_name order by id), * from table_name.

Or

select dense_rank() over (order by column_name),* from table_name.

edited Mar 17 '23 at 02:33

answered Mar 17 '23 at 02:31

Utkarsh Tiwari

11
2

score 0 · Accepted Answer · answered Jun 16 '23 at 06:43

0

This is the most straightforward answer I could find:

SELECT ROW_NUMBER() OVER () AS id, * FROM df

answered Jun 16 '23 at 06:43

stats_noob

5,401
4
27
83

SQL: Adding an ID Column to a Table

3 Answers3

Linked