How to query postgres for jsonb field that has a key set to a null value

Question

I have a table in postgres which has rows like the following in a jsonb column called data:

{ "block": { "data": null, "timestamp": "1680617159" } }
{"block": {"hash": "0xf0cab6f80ff8db4233bd721df2d2a7f7b8be82a4a1d1df3fa9bbddfe2b609e28", "size": "0x21b", "miner": "0x0d70592f27ec3d8996b4317150b3ed8c0cd57e38", "nonce": "0x1a8261f25fc22fc3", "number": "0x1847", "uncles": [], "gasUsed": "0x0", "mixHash": "0x864231753d23fb737d685a94f0d1a7ccae00a005df88c0f1801f03ca84b317eb", "gasLimit": "0x1388", "extraData": "0x476574682f76312e302e302f6c696e75782f676f312e342e32", "logsBloom": "0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000", "stateRoot": "0x2754a138df13677ca025d024c6b6ac901237e2bf419dda68d9f9519a69bfe00e", "timestamp": "0x55baa522", "difficulty": "0x3f5a9c5edf", "parentHash": "0xf50f292263f296897f15fa87cb85ae8191876e90e71ab49a087e9427f9203a5f", "sealFields": [], "sha3Uncles": "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347", "receiptsRoot": "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421", "transactions": [], "totalDifficulty": "0x239b3c909daa6", "transactionsRoot": "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421"}, "transaction_receipts": []}

I'd like to write a SQL query which selects all rows that have a null value for block.data but does not select rows which do not have a data field.

I've tried the following, and they all have failed:

SELECT * FROM table WHERE data->>'block'->>'data' IS NULL;
SELECT * FROM table WHERE data->'block'->'data' IS NULL;
SELECT * FROM table WHERE jsonb_extract_path_text(data, 'block', 'data') IS NULL;

It seems that in all of these cases, if the data field isn't present, it passes the where clause.

lemon · Answer 1 · 2023-04-04T14:55:59.537

1

You can use the '@>' operator to check if '{"data":null}' is contained in the data->'block' path.

SELECT *
FROM tab
WHERE (data->'block') @> '{"data":null}'

Output:

data
{"block":{"data":null,"timestamp":"1680617159"}}

Check the demo here.

edited Apr 04 '23 at 14:55

answered Apr 04 '23 at 14:33

lemon

14,875
6
18
38

wonderful thank you! This query works but it's super slow. Is there a way to create an index on just specific keys (or specific paths) in the json column so speed this up? – Paymahn Moghadasian Apr 04 '23 at 14:49
2

`(data->>'block')::JSONB` can be simplified to `data->'block'` – Apr 04 '23 at 14:51
Would you expect a performance benefit from that @a_horse_with_no_name? Or is it just for conciseness? – Paymahn Moghadasian Apr 04 '23 at 14:51
1

Unfortunately, I think there's not much optimization to be carried out on json operations... for further infos, you could look at [this answer](https://stackoverflow.com/a/71088092/12492890), or probably @a_horse_with_no_name is likely to give more insights in this regard definitely. – lemon Apr 04 '23 at 14:53
2

Well, it should be faster because you are not converting a jsonb value to text to convert it back to jsonb. But I don't know if you could measure the difference. But I find it much cleaner. – Apr 04 '23 at 14:53
@a_horse_with_no_name do you have any suggestions for indexes (or any other mechanism) to speed up this query? I just did some research on GIN indexes and it's not clear to me if a GIN index would help in this case. – Paymahn Moghadasian Apr 04 '23 at 15:05

Marmite Bomber · Answer 2 · 2023-04-04T15:38:02.267

Use two predicates to see if the key exists *and if the value is null

where jsonb_path_exists(data,'$.block.data') and 
jsonb_path_query_first(data,'$.block') ->> 'data' is null;

Example

with test as ( 
select 1 id, '{ "block": { "data": null, "timestamp": "1680617159" } }'::jsonb  as data union all
select 2 id, '{ "block": { "data": 1, "timestamp": "1680617159" } }'::jsonb as data union all
select 3 id, '{ "block": {  "timestamp": "1680617159" } }'::jsonb as data)
select t.id
from test t 
where jsonb_path_exists(data,'$.block.data') and 
jsonb_path_query_first(data,'$.block') ->> 'data' is null;

id|
--+
 1|

This query leads to an Seq Scanof the table as shown below

explain analyze
select t.id
from test t 
where jsonb_path_exists(data,'$.block.data') and 
jsonb_path_query_first(data,'$.block') ->> 'data' is null;

Seq Scan on test t  (cost=10000000000.00..10000003289.05 rows=167 width=4) (actual time=45.745..68.798 rows=1 loops=1)
  Filter: (jsonb_path_exists(data, '$."block"."data"'::jsonpath, '{}'::jsonb, false) AND ((jsonb_path_query_first(data, '$."block"'::jsonpath, '{}'::jsonb, false) ->> 'data'::text) IS NULL))
  Rows Removed by Filter: 100002
Planning Time: 0.079 ms
JIT:
  Functions: 4
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 0.714 ms, Inlining 8.161 ms, Optimization 26.128 ms, Emission 11.304 ms, Total 46.308 ms
 Execution Time: 69.586 ms  <<<--------------

If you have a very low number of rows with the data key defined, you may profit from the index

CREATE INDEX idx_test2 ON test USING btree (((jsonb_path_exists(data,'$.block.data'))));

The plan uses the index and is much more performant

Bitmap Heap Scan on test t  (cost=551.35..2965.38 rows=167 width=4) (actual time=0.034..0.037 rows=1 loops=1)
  Filter: (jsonb_path_exists(data, '$."block"."data"'::jsonpath, '{}'::jsonb, false) AND ((jsonb_path_query_first(data, '$."block"'::jsonpath, '{}'::jsonb, false) ->> 'data'::text) IS NULL))
  Rows Removed by Filter: 1
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on idx_test2  (cost=0.00..551.31 rows=50002 width=0) (actual time=0.025..0.025 rows=2 loops=1)
        Index Cond: (jsonb_path_exists(data, '$."block"."data"'::jsonpath, '{}'::jsonb, false) = true)
Planning Time: 0.164 ms
Execution Time: 0.052 ms  <<<--------------

Used test data

create table test as
select 1 id, '{ "block": { "data": null, "timestamp": "1680617159" } }'::jsonb  as data, repeat('x',2000) pad union all
select 2 id, '{ "block": { "data": 1, "timestamp": "1680617159" } }'::jsonb as data, repeat('x',2000) pad union all
select 3 id, '{ "block": {  "timestamp": "1680617159" } }'::jsonb as data, repeat('x',2000) pad union all
select id+3, '{ "block": {  "timestamp": "1680617159" } }'::jsonb as data, repeat('x',2000) pad from generate_series(1,100000) as t(id )
;

Would you expect this solution to be faster than the other proposed solution? Alternatively, would this solution benefit from indexes? — Paymahn Moghadasian, Apr 04 '23 at 14:57

klin · Accepted Answer · 2023-04-04T15:30:45.833

1

Just use 'null'::jsonb in the condition:

select *
from my_table
where data->'block'->'data' = 'null'

Other answers, however effective, are unnecessarily complex and hide the essence of things. The problem is that the 'null'::jsonb is not the same as Postgres null:

select 'null'::jsonb is null;

 ?column?
----------
 f
(1 row)

You can use a simple btree index that can support the first query:

create index on my_table ((data->'block'->'data'));

Alternatively, the gin index can support the @> operator:

create index on my_table using gin ((data->'block'->'data'));

select *
from my_table
where data->'block'->'data' @> 'null';

edited Apr 04 '23 at 15:30

answered Apr 04 '23 at 15:02

klin

112,967
15
204
232

I think you may have a typo in your answer (`'null'::jsonb` vs `'null'`) but yes, this is by far the cleanest. Do you have any suggestions for what sort of index I could use to make this query faster? – Paymahn Moghadasian Apr 04 '23 at 15:08
wonderful thank you! I'll give that btree index a shot and see how it helps. I'd guess the btree index will be much smaller than the GIN index, is that correct? – Paymahn Moghadasian Apr 04 '23 at 15:32
Yes, gin index seems to be a bit redundant in the case. – klin Apr 04 '23 at 15:34
The btree index worked perfectly. The query is now super fast even on ~17 million rows. – Paymahn Moghadasian Apr 05 '23 at 09:47

How to query postgres for jsonb field that has a key set to a null value

3 Answers3