Like a_horse already advised (and you mentioned yourself), the proper fix is to extract those attributes to separate columns, normalizing your design to some extent.
Can an index help?
Sadly, no (as of Postgres 14).
It could work in theory. Since your values are big, an expression index with just some small attributes can be picked up by Postgres in an index-only scan, even when retrieving all rows (where it otherwise would ignore indexes).
The manual:
However, PostgreSQL's planner is currently not very smart about such cases. It considers a query to be potentially executable by index-only scan only when all columns needed by the query are available from the index.
So you would have to include value
itself in the index, even if just as INCLUDE
column - totally spoiling the whole idea. No go.
You probably can still do something in the short term. Two crucial quotes:
I am looking to see if I can make this faster in short term
The json blobs are a bit large
Data type
Drop the cast to json
from the query. Casting every time adds pointless cost.
Compression
One major cost factor will be compression. Postgres has to "de-toast" the whole large column, just to extract some small attributes. Since Postgres 14, you can switch the compression algorithm (if support is enabled in your version!). The default is dictated by the config setting default_toast_compression
, which is set to pglz
by default. Currently the only available alternative is lz4
. You can set that per column. Any time.
LZ4 (lz4
) is considerably faster, while compressing typically a bit less. About twice as fast, but around 10 % more storage (depends!). If performance is not an issue it's best to stick to the stronger compression of the default LZ algorithm (pglz
). There may be more compression algorithms to pick from in the future.
To implement:
ALTER TABLE data
ALTER COLUMN value SET COMPRESSION lz4;
Setting a new COMPRESSION
for a column does not re-compress automatically. Postgres remembers the compression method and only re-compresses if it's forced to un-compress anyway. You may want to force re-compression of existing values. You can check with:
SELECT pg_column_compression(value) FROM data LIMIT 10;
Related blog post:
GENERATED
columns
While stuck with the broken design you might just add some (small!) generated columns to cover your query:
ALTER TABLE data
ADD COLUMN name text GENERATED ALWAYS AS (value::json ->> 'name') STORED
, ADD COLUMN mnemonic text GENERATED ALWAYS AS (value::json ->> 'mnemonic') STORED
...
And then target only those generated columns, not involving the big value
at all.
SELECT name, mnemonic, ... FROM data;
That would bypass the main performance issue.
See:
However, you mentioned:
It's the data that is further down in the json blob that often changes.
Every change to value
forces a re-check on the generated columns, so that adds write cost.