I don't have any experience with this, but it got me curious so I did some reading.
JSON only
What about something like the following (untested, BTW)? It doesn't address your question about storing a binary representation of the JSON, it's an attempt to parse all of the JSON at once for all of the rows you're checking, in the hope that it will yield higher performance by reducing the processing overhead of doing it individually for each row. If it succeeds at that, I'm thinking it may result in higher memory consumption though.
The CREATE TYPE...set_of_records()
stuff is adapted from the example on the wiki where it mentions that "You can also return records with an array of JSON." I guess it really means "an array of objects".
Is the id
value from the DB record embedded in the JSON?
Version #1
CREATE TYPE rec AS (id integer, data text, name text);
CREATE FUNCTION set_of_records() RETURNS SETOF rec AS
$$
var records = plv8.execute( "SELECT id, data FROM things" );
var data = [];
// Use for loop instead if better performance
records.forEach( function ( rec, i, arr ) {
data.push( rec.data );
} );
data = "[" + data.join( "," ) + "]";
data = JSON.parse( data );
records.forEach( function ( rec, i, arr ) {
rec.name = data[ i ].name;
} );
return records;
$$
LANGUAGE plv8;
SELECT id, data FROM set_of_records() WHERE name LIKE 'Z%'
Version #2
This one gets Postgres to aggregate / concatenate some values to cut down on the processing done in JS.
CREATE TYPE rec AS (id integer, data text, name text);
CREATE FUNCTION set_of_records() RETURNS SETOF rec AS
$$
var cols = plv8.execute(
"SELECT" +
"array_agg( id ORDER BY id ) AS id," +
"string_agg( data, ',' ORDER BY id ) AS data" +
"FROM things"
)[0];
cols.data = JSON.parse( "[" + cols.data + "]" );
var records = cols.id;
// Use for loop if better performance
records.forEach( function ( id, i, arr ) {
arr[ i ] = {
id : id,
data : cols.data[ i ],
name : cols.data[ i ].name
};
} );
return records;
$$
LANGUAGE plv8;
SELECT id, data FROM set_of_records() WHERE name LIKE 'Z%'
hstore
How would the performance of this compare?: duplicate the JSON data into an hstore column at write time (or if the performance somehow managed to be good enough, convert the JSON to hstore at select time) and use the hstore in your WHERE
, e.g.:
SELECT id, data FROM things WHERE hstore_data -> name LIKE 'Z%'
I heard about hstore from here: http://lwn.net/Articles/497069/
The article mentions some other interesting things:
PL/v8 lets you...create expression indexes on specific JSON elements and save them, giving you stored search indexes much like CouchDB's "views".
It doesn't elaborate on that and I don't really know what it's referring to.
There's a comment attributed as "jberkus" that says:
We discussed having a binary JSON type as well, but without a protocol to transmit binary values (BSON isn't at all a standard, and has some serious glitches), there didn't seem to be any point.
If you're interested in working on binary JSON support for PostgreSQL, we'd be interested in having you help out ...