0

I have a table like this:

ID | Flag
-----------
 1 | True
 1 | True
 1 | NULL
 1 | True
 1 | NULL
 2 | False
 2 | False
 2 | False
 2 | NULL
 2 | NULL

And I want an output like this:

ID | Flag
-----------
 1 | True
 1 | True
 1 | True
 1 | True
 1 | True
 2 | False
 2 | False
 2 | False
 2 | False
 2 | False

I want to replace nulls with the value assigned in different records. Is there a way to do it in a single update statement?

GMB
  • 216,147
  • 25
  • 84
  • 135
Garfield
  • 143
  • 11
  • Why do you have ID but it's not a Primary Key ? You should change the name to [SOMETHING_ID], instead of 'ID' only. Then add new column 'ID' as a primary key, it will make your life easier. – Johan Sep 21 '22 at 03:14

4 Answers4

2

One option uses a correlated subquery:

update mytable t
set flag = (select bool_or(flag) from mytable t1 where t1.id = t.id)

Demo on DB Fiddle:

id | flag
-: | :---
 1 | t   
 1 | t   
 1 | t   
 1 | t   
 1 | t   
 2 | f   
 2 | f   
 2 | f   
 2 | f   
 2 | f   
GMB
  • 216,147
  • 25
  • 84
  • 135
1

You can also use exists:

update t
     set flag = exists (select 1 from t t2 where t2.id = t.id and t2.flag);

The advantage of exists over a subquery with aggregation is performance: the query can stop at the first row where flag is true. This is a simple index lookup on an index on (id, flag).

Performance would be more improved by limiting the number of rows being updated. That actually suggests two separate statements:

update t
    set flag = true
    where (flag is null or not flag) and
          exists (select 1 from t t2 where t2.id = t.id and t2.flag);

update t
    set flag = false
    where (flag is null or flag) and
          not exists (select 1 from t t2 where t2.id = t.id and not t2.flag);

These could be combined into a single (more complicated) statement, but the sets being updated are disjoint. This limits the updates to the rows that need to be updated, as well as limiting the subquery to a simple lookup (assuming an index on (id, flag)).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

The answers provided satisfy your sample data, but may still leave you short of a satisfactory answer. That is because your sample data is missing a couple significant sets. What happens if you had the following, either instead of or in addition to your current sample data?

+----+-------+
| id | flag  |
+----+-------+
|  3 | true  |
|  3 | false |
|  3 | null  |
|  4 | null  |
|  4 | null  |
+----+-------+

The answer could be significantly different.

Belayer
  • 13,578
  • 2
  • 11
  • 22
0

Assuming (like your sample data suggests):

  • There can never be the same id with true and false in the set. Else, you'd have to define what to do.

  • null values remain unchanged if there is no non-null value for the same id.

This should give you best performance:

UPDATE tbl t
SET    flag = t1.flag
FROM  (
   SELECT DISTINCT ON (id)
          id, flag
   FROM   tbl
   ORDER  BY id, flag
   ) t1                      -- avoid repeated computation for same id
WHERE  t.id = t1.id
AND    t.flag IS NULL        -- avoid costly no-op updates
AND    t1.flag IS NOT NULL;  -- avoid costly no-op updates;

db<>fiddle here

The subquery t1 distills target values per id once.

SELECT DISTINCT ON (id)
       id, flag
FROM   tbl
ORDER  BY id, flag;

Since null sorts last, it effectively grabs the first non-null value per id. false sorts before true, but that has no bearing on the case as there can never be both for the same id. See:

If you have many rows per id, there are faster techniques:

The added conditions in the outer query prevent all no-op updates from happening, thus avoiding major cost. Only rows are updated where a null value actually changes. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228