0

I have a two part question

We have a PostgreSQL table with a jsonb column. The values in jsonb are valid jsons, but they are such that for some rows a node would come in as an array whereas for others it will come as an object.

for example, the json we receive could either be like this ( node4 I just an object )

"node1": {
    "node2": {
        "node3": {
            "node4": {
                "attr1": "7d181b05-9c9b-4759-9368-aa7a38b0dc69",
                "attr2": "S1",
                "UserID": "WebServices",
                "attr3": "S&P 500*",
                "attr4": "EI",
                "attr5": "0"
            }
        }
    }
}

Or like this ( node4 is an array )

"node1": {
    "node2": {
        "node3": {
            "node4": [
                {
                    "attr1": "7d181b05-9c9b-4759-9368-aa7a38b0dc69",
                    "attr2": "S1",
                    "UserID": "WebServices",
                    "attr3": "S&P 500*",
                    "attr4": "EI",
                    "attr5": "0"
                },
                {
                    "attr1": "7d181b05-9c9b-4759-9368-aa7a38b0dc69",
                    "attr2": "S1",
                    "UserID": "WebServices",
                    "attr3": "S&P 500*",
                    "attr4": "EI",
                    "attr5": "0"
                }
            ]
        }
    }
}

And I have to write a jsonpath query to extract, for example, attr1, for each PostgreSQL row containing this json. I would like to have just one jsonpath query that would always work irrespective of whether the node is object or array. So, I want to use a path like below, assuming, if it is an array, it will return the value for all indices in that array.

jsonb_path_query(payload, '$.node1.node2.node3.node4[*].attr1')#>> '{}' AS "ATTR1"

I would like to avoid checking whether the type in array or object and then run a separate query for each and do a union.

Is it possible?

A sub-question related to above - Since I needed the output as text without the quotes, and somewhere I saw to use #>> '{}' - so I tried that and it is working, but can someone explain, how that works?

The second part of the question is - the incoming json can have multiple sets of nested arrays and the json and the number of nodes is huge. So other part I would like to do is flatten the json into multiple rows. The examples I found were one has to identify each level and either use cross join or unnest. What I was hoping is there is a way to flatten a node that is an array, including all of the parent information, without knowing which, if any, if its parents are arrays or simple object. Is this possible as well?

Update

I tried to look at the documentation and tried to understand the #>> '{}' construct, and then I came to realise that '{}' is the right hand operand for the #>> operator which takes a path and in my case the path is the current attribute value hence {}. Looking at examples that had non-empty single attribute path helped me realise that.

Thank you

adbdkb
  • 1,897
  • 6
  • 37
  • 66
  • Do you want to return two rows in the second example, or an array of IDs in a single row? –  Oct 28 '22 at 17:24
  • Two rows. Will have other attributes in a similar condition, so total number of rows would be the product of all – adbdkb Oct 28 '22 at 17:29

2 Answers2

0

You can use a "recursive term" in the JSON path expression:

select t.some_column,
       p.attr1
from the_table t
  cross join jsonb_path_query(payload, 'strict $.**.attr1') as p(attr1)

Note that the strict modifier is required, otherwise, each value will be returned multiple times.

This will return one row for each key attr1 found in any level of the JSON structure.

For the given sample data, this would return:

attr1                                 
--------------------------------------
"7d181b05-9c9b-4759-9368-aa7a38b0dc69"
"7d181b05-9c9b-4759-9368-aa7a38b0dc69"
"7d181b05-9c9b-4759-9368-aa7a38b0dc69"
  • Thanks. So, I presume this will work, even if any node in the path is a nested array, right? Also, can you help me with the second part of the question? The reason for me to try to do that way also is the attribute names like 'ID' attributes can be under multiple paths and in that case the `$.**.ID` path may not work, right? – adbdkb Oct 28 '22 at 18:42
  • a_horse_with_no_name - Can you help with how to handle the scenario when multiple nodes can have the same attribute name like ID – adbdkb Oct 28 '22 at 23:58
0

"I would like to avoid checking whether the type in array or object and then run a separate query for each and do a union. Is it possible?"

Yes it is and your jsonpath query works fine in both cases either when node4 is a jsonb object or when it is a jsonb array because the jsonpath wildcard array accessor [*] also works with a jsonb object in the lax mode which is the default behavior (but not in the strict mode see the manual). See the test results in dbfiddle.

"I saw to use #>> '{}' - so I tried that and it is working, but can someone explain, how that works?"

The output of the jsonb_path_query function is of type jsonb, and when the result is a jsonb string, then it is automatically displayed with double quotes " in the query results. The operator #>> converts the output into the text type which is displayed without " in the query results and the associated text array '{}' just point at the root of the passed jsonb data.

" the incoming json can have multiple sets of nested arrays and the json and the number of nodes is huge. So other part I would like to do is flatten the json into multiple rows"

you can refer to the answer of a_horse_with_no_name using the recursive wildcard member accessor .**

Edouard
  • 6,577
  • 1
  • 9
  • 20
  • Thanks. @a_horse_with_no_name's answer works. In the comment I have a question if the attribute we are looking for is the "same name" under different paths, for example "ID" attribute, then will the construct `'strict $.**.ID'` still work for any ID attribute under various paths. Also the dbfiddle was very useful to understand explanation about the first part of the question - I tried this path - `'$.node1[*].node2.node3[*].node4[*]')->>'attr1'` and it worked – adbdkb Oct 28 '22 at 19:16