In PostgreSQL 9.3, I am storing some fairly complex JSON objects with arrays nested within arrays. This snippet isn't the real data, but illustrates the same concept:
{
"customerId" : "12345",
"orders" : [{
"orderId" : "54321",
"lineItems" : [{
"productId" : "abc",
"qty" : 3
}, {
"productId" : "def",
"qty" : 1
}]
}
}
I want the ability for SQL queries to operate on lineItem
objects... not just within this single JSON structure, but across all JSON objects in that table column. For instance, a SQL query that returns all distinct productId
's, and their total sold qty
sums. To prevent such a query from taking all day, I would probably want an index on lineItem
or its child fields.
Using this StackOverflow question, I figured out how to write a query that works:
SELECT
line_item->>'productId' AS product_id,
SUM(CAST(line_item->>'qty' AS INTEGER)) AS qty_sold
FROM
my_table,
json_array_elements(my_table.my_json_column->'orders') AS order,
json_array_elements(order->'lineItems') AS line_item
GROUP BY product_id;
However, that original StackOverflow question dealt with data that was only nested one level deep rather than two. I extended the same concept (i.e. a "lateral join" within the FROM
clause) by adding an extra lateral join to dive one level deeper. However, I'm not sure if this is the best approach, so the first part of my question is: What's the best approach for querying JSON data that is an arbitrary number of levels deep in the JSON objects?
For the second part of this, creating an index on such nested data, this StackOverflow question again deals with data nested only one level deep. However, I'm just completely lost, with my head swimming trying to think of how I would apply this to a deeper number of levels. Can anyone please offer a clear approach for indexing data that is at least two levels deep, as with lineItems
above?