0

In SQL (I'm using postgres, but am open to other variations), is it possible to update a value based on a row location and a column name when the table doesn't have unique rows or keys? ...without adding a column that contains unique values?

For example, consider the table:

col1 col2 col3
1 1 1
1 1 1
1 1 1

I would like to update the table based on the row number or numbers. For example, change the values of rows 1 and 3, col2 to 5 like so:

col1 col2 col3
1 5 1
1 1 1
1 5 1

I can start with the example table:

CREATE TABLE test_table (col1 int, col2 int, col3 int);
INSERT INTO test_table (col1, col2, col3) values(1,1,1);
INSERT INTO test_table (col1, col2, col3) values(1,1,1);
INSERT INTO test_table (col1, col2, col3) values(1,1,1);

Now, I could add an additional column, say "id" and simply:

UPDATE test_table SET col2 = 5 WHERE id = 1
UPDATE test_table SET col2 = 5 WHERE id = 3 

But can this be done just based on row number?

I can select based on row number using something like:

SELECT * FROM (
                SELECT *, ROW_NUMBER() OVER() FROM test_table 
                ) as sub
WHERE row_number BETWEEN 1 AND 2

But this doesn't seem to play well with the update function (at least in postgres). Likewise, I have tried using some subsets or common table expressions, but again, I'm running into difficulties with the UPDATE aspect. How can I perform something that accomplishes something like this pseudo code?: UPDATE <my table> SET <col name> = <new value> WHERE row_number = 1 or 3, or... This is trivial other languages like R or python (e.g., using pandas's .iloc function). It would be interesting to know how to do this in SQL.

Edit: in my table example, I should have specified the column types to something like int.

Docuemada
  • 1,703
  • 2
  • 25
  • 44
  • 1
    "I can select based on row number using something like: ...". This is not deterministic. This `row_number` is not related to any "insertion order" or other. Your query may return different results every time. See https://stackoverflow.com/questions/18613055/is-order-by-and-row-number-deterministic – Marth Feb 26 '21 at 21:12
  • 1
    SQL tables are unordered by definition of relation (of course, they have some internal ordering that DBMS uses to access each row), so there's no Nth row in the table without explicit order. Your spreadsheet data is explicitly ordered, because you can find the ordinal position of row from top to bottom, despite that there's no any written numbers in it. You can assign explicit row index with some functions like `row_number() over(order by ...)`, but it must have some `order by` – astentx Feb 26 '21 at 21:30
  • Great points on order (or lack there of). It might be that I could use the DBMS's pseudo id (e.g., ctid in postgres) if I really wanted to pursue the mental exercise. – Docuemada Feb 26 '21 at 21:58

1 Answers1

1

This is one of the many instances where you should embrace the lesser evil that is Surrogate Keys. Whichever table has a primary key of (col1,col2,col3) should have an additional key created by the system, such as an identity or GUID.

You don't specify the data type of (col1,col2,col3), but if for some reason you're allergic to surrogate keys you can embrace the slightly greater evil of a "combined key", where instead of a database-created value your unique key field is derived from some other fields. (In this instance, it'd be something like CONCAT(col1, '-', col2, '-', col3) ).

Should neither of the above be practical, you will be left with the greatest evil of having to manually specify all three columns each time you query a record. Which means that any other object or table which references this one will need to have not one but three distinct fields to identify which record you're talking about.

Ideally, btw, you would have some business key in the actual data which you can guarantee by design will be unique, never-changing, and never-blank. (Or at least changing so infrequently that the db can handle cascade updates reasonably well.)

You may wind up using a surrogate key for performance in such a case anyway, but that's an implementation detail rather than a data modeling requirement.