5

In an example table:

CREATE TABLE example (
    id   SERIAL PRIMARY KEY,
    data JSON NOT NULL );
INSERT INTO example (id, data) VALUES
  (1, '[{"key": "1", "value": "val1"}, {"key": "2", "value": "val2"}]')
, (2, '[{"key": "1", "value": "val1"}]')
, (3, '[{"key": "1", "value": "val1"}, {"key": "2", "value": "val2"}]');

I want to query the value field in the data column where key = 2. The query I'm currently using is this:

SELECT id,
       jsonb_path_query(
               TO_JSONB(data),
               '$[*] ? (@.key == "2").value'::JSONPATH
           )::VARCHAR AS values
FROM example

I would expect the results to be:

id values
1 "val2"
2 null
3 "val2"

But the actual result is:

id values
1 "val2"
3 "val2"

Is there a reason why the null output of jsonb_path_query() is omitted?
How do I get it to behave the way I'm expecting?

Zegarek
  • 6,424
  • 1
  • 13
  • 24
Isaac N
  • 159
  • 1
  • 10
  • 1
    If you want to use JSON path, then why is your column not defined as `jsonb`? Then you can also get rid of the casting. `jsonb` is the recommended data type for JSON values anyway. –  May 31 '21 at 09:59

4 Answers4

5

Scalar function processes a value and it can nullify it, but a set-returning function generates values, so it can end up not generating one at all. As already suggested, you can use the scalar function

SELECT id, jsonb_path_query_first(data::jsonb, '$[*]?(@.key=="2").value')
FROM example;

Alternatively, you can keep the SRF, Set-Returning Function jsonb_path_query()→ setof jsonb by wrapping it in a scalar subquery or replacing its implicit inner join. Both will indicate lack of generated values with a null, but the latter can also still spawn and return multiple: demo

SELECT id,(SELECT jsonb_path_query(data::jsonb,'$[*]?(@.key=="2").value')LIMIT 1)
FROM example;

SELECT id, values
FROM example LEFT JOIN jsonb_path_query(data::jsonb, '$[*]?(@.key=="2").value') 
                       AS srf(values) 
             ON true;

Row 2 wasn't generating any rows in that function, which is why you didn't see it - there was no row to have a null in. Select section is not a normal place for an SRF - when applied there, it ends up pushed down to from section to be inner joined with. And inner join with an empty set, is an empty set. That, combined with implicit join beahaviour, is why you can read:

  • SELECT srf(); as an actual SELECT b.x FROM srf() AS b(x);

  • SELECT a.id, srf() FROM a; meaning SELECT a.id, b.x FROM a, srf() AS b(x);

  • SELECT a.id, b.x FROM a, srf(a.v) AS b(x); and SELECT a.id, srf(a.v) FROM a; actually meaning:

    SELECT a.id, b.x FROM a INNER JOIN srf(a.v) AS b(x) ON true;

This INNER JOIN threw out your ids that didn't have result-generating data. LEFT JOIN won't.


Another illustration with a more obvious generate_series() "give me this many rows" SRF: demo

create table table1(id smallserial, how_many_to_generate int);
insert into table1(how_many_to_generate) values (1),(0),(2),(0),(1);

--no rows for the 2nd and 4th entry where `how_many_to_generate` is 0
--on the other hand, the 3rd entry requested and received two rows
select id, how_many_to_generate, generate_series(1, how_many_to_generate)
from table1;
-- id | how_many_to_generate | generate_series
------+----------------------+-----------------
--  1 |                    1 |               1
--  3 |                    2 |               1
--  3 |                    2 |               2
--  5 |                    1 |               1
Zegarek
  • 6,424
  • 1
  • 13
  • 24
3

You want jsonb_path_query_first() if you want the result of the path expression:

SELECT id,
       jsonb_path_query_first(data, '$[*] ? (@.key == "2").value') AS values
FROM example

Note that this returns a jsonb value. If you want a text value, use:

jsonb_path_query_first(data, '$[*] ? (@.key == "2").value') #>> '{}
  • Hi, what does `#>> '{}'` do? Or how does it work? I am trying to find but haven't been successful. Also, what is difference between `jsonb_path_query` AND `jsonb_path_query_first` ? Can you please explain. Also, how do I convert value to be the type like float etc., when using jsonpath in the query? Thanks – adbdkb Oct 27 '22 at 17:24
  • @a_horse_with_no_name Thankyou for your answer. I recently had a very similar problem to the original poster. Even simplifying to remove the filter expression, I found that `jsonb_path_query` removed rows that would have null values from my returned data. The answer explains how to solve the problem but doesn't explain why the original doesn't work. Perhaps my understanding of what `jsonb_path_query` does is off. Could you expand at all? Thanks again. – tim-mccurrach Feb 27 '23 at 22:09
0

As per PostgreSQL documentation the filter acts as WHERE condition

When defining the path, you can also use one or more filter expressions that work similar to the WHERE clause in SQL. A filter expression begins with a question mark and provides a condition in parentheses:

I managed to achieve what you're looking for using the LATERAL and a LEFT JOIN

SELECT id,
         *
FROM example left join 
LATERAL jsonb_path_query(
        TO_JSONB(data),
        '$[*] ? (@.key == "2").value'::JSONPATH) 
    on true;

Result

 id | id |                              data                              | jsonb_path_query 
----+----+----------------------------------------------------------------+------------------
  1 |  1 | [{"key": "1", "value": "val1"}, {"key": "2", "value": "val2"}] | "val2"
  2 |  2 | [{"key": "1", "value": "val1"}]                                | 
  3 |  3 | [{"key": "1", "value": "val1"}, {"key": "2", "value": "val2"}] | "val2"
(3 rows)
Ftisiot
  • 1,808
  • 1
  • 7
  • 13
0

To address the original question:

... why the null output of jsonb_path_query is omitted?

and the question added in the bounty of tim-mccurrach:

... why the whole row is omitted from the results in the case of a null result.

Either builds on the subtly incorrect assumption that there would be a "null output" or "null result". But, at first, there is really just "no row". Makes all the difference. Related:

jsonb_path_query() is a set-returning function ("SRF"). It doesn't return 1 result, but 0-n result rows.

Standard SQL does not allow SRF in the SELECT list, only in the FROM clause. But Postgres (like some other RDBMS) allows it. Traditionally, that led to odd corner cases. Exact behavior was finally settled (sanitized) with Postgres 10. See:

A single set-returning function like in the example:

SELECT id, jsonb_path_query(...) AS foo FROM example;

is effectively almost1 the same as:

SELECT id, foo FROM example CROSS JOIN LATERAL jsonb_path_query(...) foo;

The (implicit) join eliminates input rows where the SRF comes up empty ("no row"!).
OTOH it multiplies input rows where the SRF produces multiple result rows.

There is an example in the manual explaining this case exactly in the chapter "SQL Functions Returning Sets":

In the last SELECT, notice that no output row appears for Child2, Child3, etc. This happens because listchildren returns an empty set for those arguments, so no result rows are generated. This is the same behavior as we got from an inner join to the function result when using the LATERAL syntax.

PostgreSQL's behavior for a set-returning function in a query's select list is almost exactly the same as if the set-returning function had been written in a LATERAL FROM-clause item instead. For example,

SELECT x, generate_series(1,5) AS g FROM tab;

is almost equivalent to

SELECT x, g FROM tab, LATERAL generate_series(1,5) AS g;

Bold emphasis mine.
There is more, including details for "almost"1 above. Read it.

Solution

Working with jsonb directly to trim the noise from casting json.

The original question is oblivious to the fact that multiple objects in the JSON array could qualify.

To get all matches as set like the original query, and also preserve input rows without match, replace the implicit CROSS JOIN with LEFT JOIN ... ON true. (Then "no row" is transformed to null!):

SELECT e.id
     , d.val #>> '{}' AS any_value
FROM   example e
LEFT   JOIN LATERAL jsonb_path_query(e.data, '$[*] ? (@.key == "2").value') d(val) ON true;

See:

Get only the first match, also preserving all input rows:

SELECT id
     , jsonb_path_query_first(data, '$[*] ? (@.key == "2").value') #>> '{}' AS first_value
FROM   example;

Or get all matches as JSON array, also preserving all input rows:

SELECT id
     , jsonb_path_query_array(data, '$[*] ? (@.key == "2").value') AS all_values
FROM   example;

fiddle - with added example row that produces two matches to convey the point.

Proper conversion to text

Either of my queries uses the same "trick" to cast the JSON value to text that a_horse already displayed: #>> '{}'. Here's the outstanding explanation:

The JSON #>> operator extracts the JSON sub-object at the specified path as text. The path is provided as Postgres text array text[]. '{}' is an empty array. So we get the root object as text. Voilá. See:

Converting the JSON array to text[] is slightly more sophisticated:

SELECT e.id
     , ARRAY(SELECT jsonb_array_elements_text(jsonb_path_query_array(e.data, '$[*] ? (@.key == "2").value'))) AS all_values_txt
     , jsonb_array_to_text_array(jsonb_path_query_array(e.data, '$[*] ? (@.key == "2").value')) AS all_values_txt2
FROM   example e;

Both convert the JSON array to a Postgres text[]. The second expression encapsulates that in a function as outlined here:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228