How to turn a simple json(b) int array into an integer[] in PostgreSQL 9.4+

Question

I have this array from a json object: [1, 9, 12]

Since it uses the square bracket notation because it is fetched directly from a json object I cannot cast it to ::integer[] and when I try to use array_agg(jsonb_array_elements(simpleintarray)) I get an error saying I need to group by id, but since the array is not objects (key/value) pairs but just simple integers, I cannot see how to achieve this in a rather efficient manner.

The query returning the above simple int array from json is:

SELECT node.*, elem->'permissions' AS group_node_permissions
    FROM node
    LEFT OUTER JOIN
    jsonb_array_elements(my_user_group.node_permissions) elem
    ON elem->>'id' = node.id::text
    ORDER BY node.id

elem->'permissions' should ideally be returned as a Postgres array in the format {} so I can later use the ANY(intarray) function on it.

I'd like to avoid doing redundant inefficient workarounds such as turning elem->'permissions' into a string with ->> replacing square brackets with curly brackets and then casting into a integer array, although that would probably work.

In pseudo code what I really need is to be able to get this same result:

SELECT node.*, elem->'permissions'**::integer[]** AS group_node_permissions,

... but of course due to the [] vs {} difference from the json array to the PostgreSQL array format, this will result in an error.

This is my current (VERY ugly solution which works):

SELECT node.*, replace(replace(elem->>'permissions', '[', '{'),']','}')::integer[] AS group_node_permissions

It turns the original [1, 9, 12] (jsonb) into the form {1,9,12} (integer[])

Are there any better solutions for this?

P.S.

Is it even worth converting from the json(b) to an int array ([]) where you can fetch the array elements with jsonarray @> '12' to the Postgres integer[] array where you can use 12 = ANY(intarray). Has anyone got any idea of which is more performant and should scale better? Now that we can put arrays in columns of the jsonb data type, is this considered the preferred way over eg. the integer[] data type?

Extended info (per request by Erwin):

SELECT DISTINCT ON (my_node.id) my_node.*
FROM user_group AS my_user_group,
LATERAL
(
    SELECT node.*, elem->'permissions' AS user_group_node_permissions
    FROM node
    LEFT OUTER JOIN
    jsonb_array_elements(my_user_group.node_permissions) elem
    ON elem->>'id' = node.id::text
    ORDER BY node.id
)my_node
WHERE (my_user_group.id = ANY('{2,3}')) --try also with just: ANY('{3}')) to see node 3 is excluded
AND (user_group_node_permissions @> '12' OR (user_group_node_permissions IS NULL AND 12 = ANY(my_user_group.default_node_permissions)));

DDL:

CREATE TABLE node
(
  id bigserial NOT NULL,
  path ltree,
  name character varying(255),
  node_type smallint NOT NULL,
  created_by bigint NOT NULL,
  created_date timestamp without time zone NOT NULL DEFAULT now(),
  parent_id bigint,
  CONSTRAINT node_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

CREATE TABLE user_group
(
  id serial NOT NULL,
  name character varying,
  alias character varying,
  node_permissions jsonb,
  section_ids jsonb,
  default_node_permissions jsonb
)
WITH (
  OIDS=FALSE
);

DML:

Node:

INSERT INTO node VALUES (1, '1', 'root', 5, 1, '2014-10-22 16:51:00.215', NULL);

INSERT INTO node VALUES (2, '1.2', 'Home', 1, 1, '2014-10-22 16:51:00.215', 1);
INSERT INTO node VALUES (3, '1.2.3', 'Sample Page', 1, 1, '2014-10-22 16:51:00.215', 2);
INSERT INTO node VALUES (4, '1.2.3.4', 'Child Page Level 1', 1, 1, '2014-10-26 23:19:44.735', 3);
INSERT INTO node VALUES (5, '1.2.3.4.5', 'Child Page Level 2', 1, 1, '2014-10-26 23:19:44.735', 4);

INSERT INTO node VALUES (6, '1.2.6', 'Test Page', 1, 1, '2014-12-01 11:45:16.186', 2);
INSERT INTO node VALUES (7, '1.2.7', 'Login', 1, 1, '2014-12-01 11:54:10.208', 2);
INSERT INTO node VALUES (8, '1.2.7.8', 'MySubPage', 1, 1, '2014-12-01 12:02:54.252', 7);
INSERT INTO node VALUES (9, '1.2.9', 'Yet another test page', 1, 1, '2014-12-01 12:07:29.999', 2);

INSERT INTO node VALUES (10, '1.2.10', 'Testpage 2', 1, 1, '2014-12-02 01:43:33.233', 2);
INSERT INTO node VALUES (11, '1.2.10.11', 'Test page 2 child', 1, 1, '2014-12-02 01:45:49.78', 10);

Group:

INSERT INTO user_group VALUES (2, 'Editor', 'editor', NULL, NULL, '{1,2,3,4,5,7,9,10,12}');
INSERT INTO user_group VALUES (1, 'Administrator', 'administrator', NULL, NULL, '{1,2,3,4,5,6,7,8,9,10,11,12}');
INSERT INTO user_group VALUES (3, 'Writer', 'writer', '[{"id": 3, "permissions": [1, 9]}, {"id": 4, "permissions": [1, 9, 12]}]', NULL, '{1,3,9,12}');

Short description:

Basically what I'm doing here is this:

A user can have multiple groups (either as the integer[] datatype or the jsonb array [] - not decided yet, but taking Erwin's answer in consideration, integer might be best, since it should not hold a large array.
Each group can be assigned specific access to specific nodes* (see image below, which explains the outer left join), thus overruling the groups default global permissions (permission "12" is btw the ability to browse the node and thereby get it returned in the query)
Since the "writer" group does have "12" (browse) permission, BUT the node permissions for the node with id 3 does not have permission "12", the user with only group "writer" will not get the node with id 3 in the results returned by the select query. If however the user also has another group, and that does not exclude the node - that node will of course be returned, since more "powerful" groups superseeds the "weaker" ones.

Slow performance - can it be optimized?

EXPLAIN ANALYZE query performance

Text output

(You can zoom in on the pictures in the browser)

Compared to the above, a simple SELECT * FROM node executes in 0.046ms (measured with EXPLAIN ANALYZE again)

If you still could use even more info, please feel free to ask.

There is an invalid reference to a table `my_user_group` in the `LEFT JOIN`. Can you please clarify that? And please add (the relevant parts of) table definitions for involved tables (`\d tbl` in psql). More details are needed. — Erwin Brandstetter, Feb 01 '15 at 02:01
I've added table definitions and a little more description. The query does not perform very well when returning just 60 nodes (see EXPLAIN ANALYZE screenshot above), so I'd of course be very interested in hearing your opinion about where I could possibly optimize, or whether or not aggregate queries like these should be moved from PostgreSQL to the Web Application itself for more efficienty (I'm developing in Go). My reason for writing this query is that I know in Go how many iterations such a query would cause - in Postgres I can only hope things are more efficient, but I'm new to pg ;-p — Dac0d3r, Feb 01 '15 at 10:21
Graphical explain from pgAdmin is nice for an overview, but the text output carries all the details. We don't get tooltips from a picture. Similar for table data: We cannot copy/paste from an image. Rather post text in both cases. Also, I was under the impression you wanted to extract arrays into a new table - but now it seems you want to work off the original table? — Erwin Brandstetter, Feb 01 '15 at 22:00
Of course. Thanks for the advice! I've now updated the post with DML for the tables and the text output of the explain. Basically I'm trying to utilize pg to avoid having huge junction-tables when setting specific node permissions per user or group. The query seems to do all that as it is right now - but I can't help but think that this is perhaps asking too much of pg, in terms of raw power and work involved to filter nodes based on permissions like this? Originally I just wanted to take my jsonb int array and return it (cast it) to integer[] in the returned result-set. :-) — Dac0d3r, Feb 02 '15 at 10:58

score 4 · Accepted Answer · edited May 23 '17 at 12:32

Obviously you have a JSON array nested inside an outer JSON array:

SELECT n.*, array_agg(p)::int[] AS group_node_permissions
FROM   my_user_group u
     , jsonb_array_elements(u.node_permissions) elem
JOIN   node n ON n.id = (elem->>'id')::int
     , jsonb_array_elements_text(elem->'permissions') p
GROUP  BY n.id;  -- id being the PK

This is assuming there are no empty arras in permissions. Else you need LEFT JOIN LATERAL ... ON TRUE:
- Call a set-returning function with an array argument multiple times
This should preserve the original order of the JSON array, but there are no guarantees. If you need to make sure, use WITH ORDINALITY.
- PostgreSQL unnest() with element number
LEFT [OUTER] JOIN would be pointless, since the later predicate on a column of the left table forces [INNER] JOIN behavior anyways.

Related answer on dba.SE with more details and explanation:

How to turn json array into Postgres array?

Depending on details of the use case, it might be a good idea to support the query with a GIN index:

Index for finding an element in a JSON array

As for your P.S., it depends on the complete picture. All other considerations aside a Postgres array is typically a bit smaller and faster than jsonb holding a JSON array. Testing for existence of an element can be very fast with with a GIN index either way:

jsonarray @> '12'
intarray @> '{12}'

Note in particular, that the variant 12 = ANY(intarray) is not supported by a GIN index. Details in the manual.

Is `ORDER BY n.id` a typo? I think you may wanted to write `GROUP BY n.id`. -- An array constructor (f.ex. `array(select jsonb_array_elements_text(elem->'permissions')::int)`) should be faster than the last join + aggregate combo (& it also solves the empty array in `permissions` problem) — pozs, Feb 02 '15 at 10:28

How to turn a simple json(b) int array into an integer[] in PostgreSQL 9.4+

1 Answers1

Linked