Conditionally replace single value per row in jsonb column

Question

I need a more efficient way to update rows of a single table in Postgres 9.5. I am currently doing it with pg_dump, and re-import with updated values after search and replace operations in a Linux OS environment.

table_a has 300000 rows with 2 columns: id bigint and json_col jsonb. json_col has about 30 keys: "C1" to "C30" like in this example:

Table_A

    id,json_col
    1  {"C1":"Paris","C2":"London","C3":"Berlin","C4":"Tokyo", ... "C30":"Dallas"}
    2  {"C1":"Dublin","C2":"Berlin","C3":"Kiev","C4":"Tokyo", ... "C30":"Phoenix"}
    3  {"C1":"Paris","C2":"London","C3":"Berlin","C4":"Ankara", ... "C30":"Madrid"}
    ...

The requirement is to mass search all keys from C1 to C30 then look in them for the value "Berlin" and replace with "Madrid" and only if Madrid is not repeated. i.e. id:1 with Key C3, and id:2 with C2. id:3 will be skipped because C30 exists with this value already

It has to be in a single SQL command in PostgreSQL 9.5, one time and considering all keys from the jsonb column.

klin · Accepted Answer · 2018-12-17T07:32:49.447

The fastest and simplest way is to modify the column as text:

update table_a
set json_col = replace(json_col::text, '"Berlin"', '"Madrid"')::jsonb
where json_col::text like '%"Berlin"%'
and json_col::text not like '%"Madrid"%'

It's a practical choice. The above query is rather a find-and-replace operation (like in a text editor) than a modification of objects attributes. The second option is more complicated and surely much more expensive. Even using the fast Javascript engine (example below) more formal solution would be many times slower.

You can try Postgres Javascript:

create extension if not exists plv8;

create or replace function replace_item(data jsonb, from_str text, to_str text)
returns jsonb language plv8 as $$
    var found = 0;
    Object.keys(data).forEach(function(key) {
        if (data[key] == to_str) {
            found = 1;
        }
    })
    if (found == 0) {
        Object.keys(data).forEach(function(key) {
            if (data[key] == from_str) {
                data[key] = to_str;
            }
        })
    }
    return data;
$$;

update table_a
set json_col = replace_item(json_col, 'Berlin', 'Madrid');

Even if it is treated as text, still can work. Thanks, it is a satisfactory solution. — Dmitry, Dec 17 '18 at 03:28
@Dmitry: if that answers your question, consider accepting the answer. — Erwin Brandstetter, Dec 18 '18 at 22:44

Dmitry · Answer 2 · 2018-12-20T20:05:42.720

Ok i have tested all methods and i can say you did a great job This helped me a lot. Let me share my feedback with you.

Method 1 sugested by Klin. Works perfect and is totally fine, except if key is named like value, then both will be replaced key and value. i.e.: "Berlin":"Berlin" becomes "Madrid":"Madrid"

Method 2 with plv8 extension did not worked because i am missing controll file i had to install it and i just skipped this method, so i have no feedback regarding this method. Error that i was getting was this: ERROR: could not open extension control file "/usr/pgsql-9.5/share/extension/plv8.control": No such file or directory

Method 3 similar to method 2 with jsonb_replace_value function works perfect, in replaces rows that contains specific value regardless of the key. And adding condition

WHERE json_col <> jsonb_replace_value(json_col, '"Berlin"', '"Madrid"')

will avoid empty updates and will skip rows than do not need to be updated And somethig like this

{"Berlin":"Berlin"} becomes {"Berlin":"Madrid"} i.e. Key is not touched, just value

Method 4 is a little more complicated, it uses Method 3 and Indexes It works totally awesome and super speedy.
And NOT EXISTS semi-anti-join indeed forced to use Index again.
I was shocked how fast it performed!!!

However i discovered all this methods will work if json string looks like this: {"key":"value"} If i have for example to update a value that is a json object it will not update something like this:
{"C30":{"id":10044,"value":"Berlin","created_by":"John Doe"}}

MANY THANKS to you guys. @klin and @erwin-brandstetter. This helped me to learn something new!

*"I was shocked how fast it performed!!!"* Ah, that's the effect I was aiming for. :) As for plv8: You need get the software before you can install it (it's not in the standard distribution). For instance: https://pgxn.org/dist/plv8/ (I am not normally using it myself). Finally, to replace values in nested objects, too, you'd have to do more. I had added a note to that effect. — Erwin Brandstetter, Dec 21 '18 at 02:11

Erwin Brandstetter · Answer 3 · 2018-12-19T01:33:47.503

What makes this hard is that you are looking for unknown keys holding values of interest. Postgres infrastructure is optimized to find keys (or array values).

Possibly caused by a sub-optimal table design. The many top-level objects of your jsonb column might be replaced by an array, discarding irrelevant key names altogether. (Or maybe another array for key names.) Or, ideally with a full normalized DB schema to begin with.

Be that as it may, here is a proof of concept, how this can be fast and clean with stock Postgres 9.5 or later anyway.

Additional difficulty 1: it's unknown whether duplicate values are possible.
Additional difficulty 2: value frequencies are unknown, too.
Additional difficulty 3: only the first value found is to be replaced and only if the target value is not there yet. Implementing this with set-based operations is possible, but unwieldy. I wrote a plpgsql function instead:

CREATE OR REPLACE FUNCTION jsonb_replace_value(_j jsonb, _old jsonb, _new jsonb)
   RETURNS jsonb AS
$func$
DECLARE
   _key text;
   _val jsonb;
BEGIN
   FOR _key, _val IN
      SELECT * FROM jsonb_each(_j)
   LOOP
      IF _val = _old THEN
         RETURN jsonb_set(_j, ARRAY[_key], _new);  -- update 1st key
      END IF;
   END LOOP;

   RETURN _j;  -- nothing found, return original
END
$func$ LANGUAGE plpgsql IMMUTABLE;

COMMENT ON FUNCTION jsonb_replace_value(jsonb, jsonb, jsonb) IS '
Replace the first occurrence of _old value with _new.
Call:
    SELECT jsonb_replace_value('{"C1":"Paris","C3":"Berlin","C4":"Berlin"}', '"Berlin"', '"Madrid"')';

Could be enhanced to optionally replace all occurrences etc. but that's beyond the scope of this question.

Now this would be simple:

UPDATE table_a
SET    json_col = jsonb_replace_value(json_col, '"Berlin"', '"Madrid"'); -- note jsonb literal syntax!

If all rows need an update, we can stop here. Won't get faster. (Except possibly with alternatives like demonstrated by @klin.)
If a large percentage of all rows need an update, add a WHERE condition to avoid empty updates:

...
WHERE  json_col <> jsonb_replace_value(json_col, '"Berlin"', '"Madrid"');

See:

How do I (or can I) SELECT DISTINCT on multiple columns?

Typically, only very few rows actually need an update. Then iterating through all rows with above query is expensive. We need index support to make it fast. Not easy for the case. I suggest an expression index based on an IMMUTABLE function extracting the array of values:

CREATE OR REPLACE FUNCTION jsonb_object_val_arr(jsonb)
   RETURNS text[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY (SELECT value FROM jsonb_each_text($1))';

COMMENT ON FUNCTION jsonb_object_val_arr(jsonb) IS '
   Generates text array of values in outermost jsonb object.
   Of limited use if there can be nested objects.';

CREATE INDEX table_a_val_arr_idx ON table_a USING gin (jsonb_object_val_arr(json_col));

Related, with more explanation:

Find rows containing a key in a JSONB array of records

Query making use of this index:

UPDATE table_a a
SET    json_col = jsonb_replace_value(a.json_col, '"Berlin"', '"Madrid"')
WHERE  jsonb_object_val_arr(json_col) @> '{Berlin}' -- has Berlin, possibly > 1x ..
-- AND    NOT jsonb_object_val_arr(json_col) @> '{Madrid}'
AND    NOT EXISTS (                                         -- .. but not Madrid
   SELECT FROM table_a b
   WHERE  jsonb_object_val_arr(json_col) @> '{Madrid}'  -- note array literal syntax
   AND    b.id = a.id
   );

The NOT EXISTS semi-anti-join is carefully drafted to utilize the index a 2nd time.

The commented simpler alternative is faster if there are few rows with 'Berlin' and 'Madrid' - then a filter step in the query plan will be cheaper.

Should be very fast.

db<>fiddle here for Postgres 9.5 demonstrating all.

Conditionally replace single value per row in jsonb column

3 Answers3