How to query and index JSON data nested multiple levels deep in PostgreSQL 9.3+?

Question

In PostgreSQL 9.3, I am storing some fairly complex JSON objects with arrays nested within arrays. This snippet isn't the real data, but illustrates the same concept:

{
   "customerId" : "12345",
   "orders" : [{
      "orderId" : "54321",
      "lineItems" : [{
         "productId" : "abc",
         "qty" : 3
      }, {
         "productId" : "def",
         "qty" : 1
      }]
   }
}

I want the ability for SQL queries to operate on lineItem objects... not just within this single JSON structure, but across all JSON objects in that table column. For instance, a SQL query that returns all distinct productId's, and their total sold qty sums. To prevent such a query from taking all day, I would probably want an index on lineItem or its child fields.

Using this StackOverflow question, I figured out how to write a query that works:

SELECT
   line_item->>'productId' AS product_id, 
   SUM(CAST(line_item->>'qty' AS INTEGER)) AS qty_sold
FROM
   my_table, 
   json_array_elements(my_table.my_json_column->'orders') AS order,
   json_array_elements(order->'lineItems') AS line_item
GROUP BY product_id;

However, that original StackOverflow question dealt with data that was only nested one level deep rather than two. I extended the same concept (i.e. a "lateral join" within the FROM clause) by adding an extra lateral join to dive one level deeper. However, I'm not sure if this is the best approach, so the first part of my question is: What's the best approach for querying JSON data that is an arbitrary number of levels deep in the JSON objects?

For the second part of this, creating an index on such nested data, this StackOverflow question again deals with data nested only one level deep. However, I'm just completely lost, with my head swimming trying to think of how I would apply this to a deeper number of levels. Can anyone please offer a clear approach for indexing data that is at least two levels deep, as with lineItems above?

score 2 · Accepted Answer · answered May 10 '14 at 18:57

To deal with the infinite recursion issue, you need to use a recursive CTE to operate upon each individual json element in each table row:

WITH RECURSIVE

raw_json as (

  SELECT 

    *

  FROM

  (VALUES 

  (1,
  '{
  "customerId": "12345",
  "orders": [
    {
      "orderId": "54321",
      "lineItems": [
        {
          "productId": "abc",
          "qty": 3
        },
        {
          "productId": "def",
          "qty": 1
        }
      ]
    }
  ]
}'::json),

  (2,
  '{
  "customerId": "678910",
  "artibitraryLevel": {
    "orders": [
      {
        "orderId": "55345",
        "lineItems": [
          {
            "productId": "abc",
            "qty": 3
          },
          {
            "productId": "ghi",
            "qty": 10
          }
        ]
      }
    ]
  }
}'::json)



) a(id,sample_json)

),


json_recursive as (

  SELECT
    a.id,
    b.k,
    b.v,
    b.json_type,
    case when b.json_type = 'object' and not (b.v->>'customerId') is null then b.v->>'customerId' else a.customer_id end customer_id, --track any arbitrary id when iterating through json graph
    case when b.json_type = 'object' and not (b.v->>'orderId') is null then b.v->>'orderId' else a.order_id end order_id,
    case when b.json_type = 'object' and not (b.v->>'productId') is null then b.v->>'productId' else a.product_id end product_id

  FROM

    (

      SELECT

        id,
        sample_json v,
        case left(sample_json::text,1)
          when '[' then 'array'
          when '{' then 'object'
          else 'scalar'
        end json_type, --because choice of json accessor function depends on this, and for some reason postgres has no built in function to get this value
        sample_json->>'customerId' customer_id,
        sample_json->>'orderId' order_id,
        sample_json->>'productId' product_id

      FROM

        raw_json
    ) a
    CROSS JOIN LATERAL (

      SELECT

        b.k,
        b.v,
        case left(b.v::text,1)
          when '[' then 'array'
          when '{' then 'object'
          else 'scalar'
        end json_type


      FROM

        json_each(case json_type when 'object' then a.v else null end ) b(k,v) --get key value pairs for individual elements if we are dealing with standard object

     UNION ALL


      SELECT

        null::text k,
        c.v,
        case left(c.v::text,1)
          when '[' then 'array'
          when '{' then 'object'
          else 'scalar'
        end json_type


      FROM

        json_array_elements(case json_type when 'array' then a.v else null end) c(v) --if we have an array, just get the elements and use parent key


    ) b


UNION ALL --recursive term

    SELECT
    a.id,
    b.k,
    b.v,
    b.json_type,
    case when b.json_type = 'object' and not (b.v->>'customerId') is null then b.v->>'customerId' else a.customer_id end customer_id,
    case when b.json_type = 'object' and not (b.v->>'orderId') is null then b.v->>'orderId' else a.order_id end order_id,
    case when b.json_type = 'object' and not (b.v->>'productId') is null then b.v->>'productId' else a.product_id end product_id




  FROM

    json_recursive a
    CROSS JOIN LATERAL (

      SELECT

        b.k,
        b.v,
        case left(b.v::text,1)
          when '[' then 'array'
          when '{' then 'object'
          else 'scalar'
        end json_type


      FROM

        json_each(case json_type when 'object' then a.v else null end ) b(k,v)


     UNION ALL


      SELECT

        a.k,
        c.v,
        case left(c.v::text,1)
          when '[' then 'array'
          when '{' then 'object'
          else 'scalar'
        end json_type


      FROM

        json_array_elements(case json_type when 'array' then a.v else null end) c(v)

    ) b

)

Then you can either sum "qty" by an arbitrary id...

SELECT
  customer_id,
  sum(v::text::integer)

FROM

  json_recursive

WHERE

  k = 'qty'

GROUP BY 

  customer_id

Or you can get the "lineItem" objects and manipulate them as you desire:

SELECT 

  *

FROM 

  json_recursive 

WHERE

  k = 'lineItems' and json_type = 'object'

As for the indexing, you could adapt the recursive query into a function that returns the unique keys for each json object in each row of your original table and then it to create a functional index on your json column:

SELECT

  array_agg(DISTINCT k)

FROM

  json_recursive

WHERE

  not k is null

How to query and index JSON data nested multiple levels deep in PostgreSQL 9.3+?

1 Answers1