Update based on same column

Question

I have a table like this:

ID | Flag
-----------
 1 | True
 1 | True
 1 | NULL
 1 | True
 1 | NULL
 2 | False
 2 | False
 2 | False
 2 | NULL
 2 | NULL

And I want an output like this:

ID | Flag
-----------
 1 | True
 1 | True
 1 | True
 1 | True
 1 | True
 2 | False
 2 | False
 2 | False
 2 | False
 2 | False

I want to replace nulls with the value assigned in different records. Is there a way to do it in a single update statement?

Why do you have ID but it's not a Primary Key ? You should change the name to [SOMETHING_ID], instead of 'ID' only. Then add new column 'ID' as a primary key, it will make your life easier. — Johan, Sep 21 '22 at 03:14

score 2 · Answer 1 · answered May 31 '20 at 23:47

One option uses a correlated subquery:

update mytable t
set flag = (select bool_or(flag) from mytable t1 where t1.id = t.id)

Demo on DB Fiddle:

id | flag
-: | :---
 1 | t   
 1 | t   
 1 | t   
 1 | t   
 1 | t   
 2 | f   
 2 | f   
 2 | f   
 2 | f   
 2 | f

Gordon Linoff · Answer 2 · 2020-06-01T00:12:23.683

You can also use exists:

update t
     set flag = exists (select 1 from t t2 where t2.id = t.id and t2.flag);

The advantage of exists over a subquery with aggregation is performance: the query can stop at the first row where flag is true. This is a simple index lookup on an index on (id, flag).

Performance would be more improved by limiting the number of rows being updated. That actually suggests two separate statements:

update t
    set flag = true
    where (flag is null or not flag) and
          exists (select 1 from t t2 where t2.id = t.id and t2.flag);

update t
    set flag = false
    where (flag is null or flag) and
          not exists (select 1 from t t2 where t2.id = t.id and not t2.flag);

These could be combined into a single (more complicated) statement, but the sets being updated are disjoint. This limits the updates to the rows that need to be updated, as well as limiting the subquery to a simple lookup (assuming an index on (id, flag)).

score 0 · Answer 3 · answered Jun 01 '20 at 01:06

The answers provided satisfy your sample data, but may still leave you short of a satisfactory answer. That is because your sample data is missing a couple significant sets. What happens if you had the following, either instead of or in addition to your current sample data?

+----+-------+
| id | flag  |
+----+-------+
|  3 | true  |
|  3 | false |
|  3 | null  |
|  4 | null  |
|  4 | null  |
+----+-------+

The answer could be significantly different.

Erwin Brandstetter · Answer 4 · 2020-06-01T02:48:17.743

Assuming (like your sample data suggests):

There can never be the same id with true and false in the set. Else, you'd have to define what to do.
null values remain unchanged if there is no non-null value for the same id.

This should give you best performance:

UPDATE tbl t
SET    flag = t1.flag
FROM  (
   SELECT DISTINCT ON (id)
          id, flag
   FROM   tbl
   ORDER  BY id, flag
   ) t1                      -- avoid repeated computation for same id
WHERE  t.id = t1.id
AND    t.flag IS NULL        -- avoid costly no-op updates
AND    t1.flag IS NOT NULL;  -- avoid costly no-op updates;

db<>fiddle here

The subquery t1 distills target values per id once.

SELECT DISTINCT ON (id)
       id, flag
FROM   tbl
ORDER  BY id, flag;

Since null sorts last, it effectively grabs the first non-null value per id. false sorts before true, but that has no bearing on the case as there can never be both for the same id. See:

If you have many rows per id, there are faster techniques:

Optimize GROUP BY query to retrieve latest row per user

The added conditions in the outer query prevent all no-op updates from happening, thus avoiding major cost. Only rows are updated where a null value actually changes. See:

How do I (or can I) SELECT DISTINCT on multiple columns?

Update based on same column

4 Answers4