11

Querying Postgres 9.3 by json field is really great. However i couldn't find a formal way to update the json object, for which i use an internal function written in plpythonu based on previous post (How do I modify fields inside the new PostgreSQL JSON datatype?):

CREATE OR REPLACE FUNCTION json_update(data json, key text, value json)
RETURNS json AS
$BODY$
   from json import loads, dumps
   if key is None: return data
   js = loads(data)
   js[key] = value
   return dumps(js)
$BODY$
LANGUAGE plpythonu VOLATILE

It works really well when my json updates remains flatten and simple. Say "chat" is a json type filed in "GO_SESSION" table, and contains {"a":"1","b":"2"}, the following code will change 'b' value and turn "chat" to be {"a":"1","b":"5"}

update "GO_SESSION" set chat=json_update(chat,'b','5') where id=3

The problem is when i'm trying to assing 'b' another object rather than a simple value:

update "GO_SESSION" set chat=json_update(chat,'b','{"name":"steve"}') where id=3

The result in database is 'b' containing an escaped string rather than a real json object:

{"a": "1", "b": "{\"name\":\"steve\"}"}

I have tried different ways to unescape or dump my json in order to keep 'b' an object, but couldn't find a solution.

Thank you

Community
  • 1
  • 1
user3193043
  • 125
  • 1
  • 6
  • 1
    Curious to know… who is writing tutorials that teaches users to create these kinds of batshit crazy schemas? – Denis de Bernardy Jan 14 '14 at 08:48
  • This qustion is based on a previous post: http://stackoverflow.com/questions/18209625/how-do-i-modify-fields-inside-the-new-postgresql-json-datatype – user3193043 Jan 14 '14 at 09:39
  • might be, but updating json that way is still dubious, to put it mildly. if you need to update individual pieces of data stored in json, that data should almost certainly be *actual* fields in your table. – Denis de Bernardy Jan 14 '14 at 09:48
  • 1
    The efficiency of this structure depends very much on the way its being used. for example if my current json field is designated to handle heavy loads of "read" requests and much less "writes" using the postgres json datatype in this way would serve the purpose very well. – user3193043 Jan 14 '14 at 15:27
  • If by you "read" do not mean "read the entire json data without ever wondering what's inside until it's out of the database", methinks you'll be in for big surprises down the road. Speaking of which, from an internals viewpoint, you might read the entire field in your app, edit it in there, and update the entire field afterwards… That's what Postgres is doing anyway when you use your function; the only difference is, your seeking to manipulate json from within PG loosely amounts to re-inventing a poorly performing version of the `update` statement. – Denis de Bernardy Jan 14 '14 at 16:02
  • Yes, the point of this json is to be transmitted from PG straight to client-side. Although i haven't checked it, at least in theory "stored procedure" as such can be quicker than doing it at application level. Yet you have a point about the model being loose and kind of a 'workaround'. its definitely taken in consideration while moderately using it. – user3193043 Jan 14 '14 at 17:20
  • What I'm trying to get at is that, if PG ever needs to treat the json as anything but an opaque blob, your schema will eventually blow up in your face and you'll be better off storing the data within it in proper fields. This, for the same reasons as with hstore, xml, an EAV store, etc. — see http://stackoverflow.com/questions/18799704/are-postgres-json-indexes-efficient-enough-compared-with-classic-normalized-tabl/18801474#18801474 – Denis de Bernardy Jan 14 '14 at 17:34

3 Answers3

11

No eval is required. Your issue is that you're not decoding the value as a json object.

CREATE OR REPLACE FUNCTION json_update(data json, key text, value json)
RETURNS json AS
$BODY$
   from json import loads, dumps
   if key is None: return data
   js = loads(data)
   # you must decode 'value' with loads too:
   js[key] = loads(value)
   return dumps(js)
$BODY$
LANGUAGE plpythonu VOLATILE;

postgres=# SELECT json_update('{"a":1}', 'a', '{"innerkey":"innervalue"}');
            json_update            
-----------------------------------
 {"a": {"innerkey": "innervalue"}}
(1 row)

Not only that, but using eval to decode json is dangerous and unreliable. It's unreliable because json isn't Python, it just happens to evaluate a little bit like it much of the time. It's unsafe because you never know what you might be eval'ing. In this case you are largely protected by PostgreSQL's json parser:

postgres=# SELECT json_update(
postgres(#    '{"a":1}', 
postgres(#    'a', 
postgres(#    '__import__(''shutil'').rmtree(''/glad_this_is_not_just_root'')'
postgres(# );
ERROR:  invalid input syntax for type json
LINE 4:          '__import__(''shutil'').rmtree(''/glad_this_is_not_...
                 ^
DETAIL:  Token "__import__" is invalid.
CONTEXT:  JSON data, line 1: __import__...

... but I won't be at all surprised if someone can slip an eval exploit past that. So the lesson here: don't use eval.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
0

Solved

The problem with the above plpythonu function is that it relates to "value" as a string no matter if it's actually a complex json object. The key to solve it is to add eval() around value:

js[key] = eval(value)

That way the json string (named 'value' in this example) looses it's outer enclosing double quotes "{...}" and become an object.

user3193043
  • 125
  • 1
  • 6
0

for people who want plv8 (trusted language usable on services like Heroku). I often need to do migrations or updates to json blobs and running a query directly on the db is much faster than downloading all the data, transforming it and then posting an update.

CREATE EXTENSION plv8;
CREATE OR REPLACE FUNCTION json_replace_string(obj json, path text, value text, force boolean)
RETURNS json AS $$
if (value === null && !force) {
  return obj;
}
var nestedRe = /(\.|\[)/;
var scrub = /]/g;
path = path.replace(scrub, '');
var pathBits = path.split(nestedRe);
var len = pathBits.length;
var layer = obj;
for (var i = 0; i < len; i += 2) {
  if (layer === null || layer === undefined) return obj;
  var key = pathBits[i];
  if (key === '') continue;
  if (i === len - 1) {
    layer[key] = value;
  } else {
    if (force && typeof layer[key] === 'undefined') {
      layer[key] = pathBits[i+1] === '.' ? {} : [];
    }
    layer = layer[key];
  }
}
return obj;
$$ LANGUAGE plv8 IMMUTABLE;

You can use this like so

UPDATE my_table
SET blob=json_replace_string(blob, 'some.nested.path[5].to.object', 'new value', false)
WHERE some_condition;

the force parameter serves two functions - (1) lets you set a null value. If you are dynamically generating the value based on other columns that don't exist - e.g. blob->'non_existent_value' then null will be input into the function and you probably don't mean to set the value to null. The (2) purpose is to force the creation of the nested path if it doesn't already exist in the json object you are mutating. e.g

json_replace(string('{"some_key": "some_val"}', 'other_key', 'new_val', true)

gives

{"some_key": "some_val", "other_key": "new_val"}

You can imagine similar functions to update numeric, delete keys etc. This basically enables mongo like functionality inside postgres during the early stages of new features for quick prototyping and as our schema stabilizes we break things out to independent columns and tables to get the best performance.

Ted Tomlinson
  • 773
  • 2
  • 7
  • 18