0

I am working with Netezza SQL.

I want to add an ID column to my table (e.g. row 1 - id = 1, row 2 - id = 2, etc.) - each row should have a unique ID value.

Usually, this is not a problem. I just run the following code:

SELECT ROW_NUMBER() OVER(ORDER BY column_name) AS id, column1, column2, column3
FROM your_table;

However, the above code assumes that there exists some column ("column_name") with no duplicates - if this column does contain duplicates, then the ID column will contain duplicates as well.

In a previous question (SQL: Using CASE WHEN for Missing Rows), I learned that Netezza contains a built-in table with integers from 0,1,2,3...n . This table is called _v_vector_idx and contains a single column (IDX) with these integers.

My Question: I am trying to learn how I can use this _v_vector_idx table to create ID's in my table - I had thought of doing this using an INNER JOIN, but I am not sure what column I would use for this join.

Can someone please show me how to do this?

Thanks!

Note: I found an indirect way to accomplish something similar

select *,cast(random()* 10000 as integer) as random_number from your_table

But apart from these being random ID's (not in ascending order from 0 onwards) - I have no way of guaranteeing that some of these random ID's were not coincidentally created twice. I know that I can check this manually using a DISTINCT statement - but this is very time consuming and I was hoping for a more direct way.

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • Have you tried `ROW_NUMBER() OVER()` or `ROW_NUMBER() OVER(PARTITION BY 1)`? – Atmo Mar 16 '23 at 16:27
  • error: row number must include order by specification – stats_noob Mar 16 '23 at 17:01
  • Read your question again and I see a mistake in your question: `ROW_NUMBER()` does not need the column to have no duplicates. It will generate unique ids even if `column_name` has duplicates. The window functions that generate the same id for duplicate values in `column_name` are `RANK()` and `DENSE_RANK()` – Atmo Mar 17 '23 at 00:36

3 Answers3

1

This is untrue:

SELECT ROW_NUMBER() OVER(ORDER BY column_name) AS id, column1, column2, column3 FROM your_table; However, the above code assumes that there exists some column ("column_name") with no duplicates

The ORDER BY used in the OVER clause does NOT require any of the columns specified to be unique. Regardless of how non-unique the ordering is, ROW_NUMBER will not repeat an integer and every row gets a unique row number (unless you are also using PARTITION BY)

So I would contend that you simply do not need to use _v_vector_idx

ROW NUMBER Assigns a unique number to each row to which it is applied. ROW_NUMBER() over(window_spec)

https://www.ibm.com/docs/en/psfa/7.2.1?topic=reference-functions

Where a table of numbers can be useful is if looking for gaps in another table. Let's assume you have a serial number of some sort held in table1:

select numbertable.value
from numbertable
left join table1 on numbertable.value = table1.serialno
where table1.serialno IS NULL
and numbertable.value between 10000 and 99999

This would list the numbers not yet used by the table1.serialno in a given range

nb: I'm not a Netezza user so I don't know the details of _v_vector_idx

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
1

Actually if you wanted to find duplicates with the row number () you can either use dense_rank() or partition by .

For Ex:-

select row_number() over (partition by Column_name order by id), * from table_name.

Or

select dense_rank() over (order by column_name),* from table_name.
0

This is the most straightforward answer I could find:

SELECT ROW_NUMBER() OVER () AS id, * FROM df
stats_noob
  • 5,401
  • 4
  • 27
  • 83