1

How to avoid the unnecessary CPU cost?

See this historic question with failure tests. Example: j->'x' is a JSONb representing a number and j->'y' a boolean. Since the first versions of JSONb (issued in 2014 with 9.4) until today (6 years!), with PostgreSQL v12... Seems that we need to enforce double conversion:

  1. Discard j->'x' "binary JSONb number" information and transforms it into printable string j->>'x';
    discard j->'y' "binary JSONb boolean" information and transforms it into printable string j->>'y'.

  2. Parse string to obtain "binary SQL float" by casting string (j->>'x')::float AS x;
    parse string to obtain "binary SQL boolean" by casting string (j->>'y')::boolean AS y.

Is there no syntax or optimized function to a programmer enforce the direct conversion?

I don't see in the guide... Or it was never implemented: is there a technical barrier to it?


NOTES about typical scenario where we need it

(responding to comments)

Imagine a scenario where your system need to store many many small datasets (real example!) with minimal disk usage, and managing all with a centralized control/metadata/etc. JSONb is a good solution, and offer at least 2 good alternatives to store in the database:

  1. Metadata (with schema descriptor) and all dataset in an array of arrays;
  2. Separating Metadata and table rows in two tables.

(and variations where metadata is translated to a cache of text[], etc.)
Alternative-1, monolitic, is the best for the "minimal disk usage" requirement, and faster for full information retrieval. Alternative-2 can be the choice for random access or partial retrieval, when the table Alt2_DatasetLine have also more one column, like time, for time series.

You can create all SQL VIEWS in a separated schema, for example

CREATE mydatasets.t1234 AS 
  SELECT (j->>'d')::date AS d,  j->>'t' AS t,  (j->>'b')::boolean AS b,
         (j->>'i')::int AS i,  (j->>'f')::float AS f
  FROM (
   select jsonb_array_elements(j_alldata) j FROM Alt1_AllDataset
   where dataset_id=1234
  ) t
  -- or FROM alt2...
;

And CREATE VIEW's can by all automatic, running the SQL string dynamically ... we can reproduce the above "stable schema casting" by simple formating rules, extracted from metadata:

SELECT string_agg( CASE 
   WHEN x[2]!='text' THEN format(E'(j->>\'%s\')::%s AS %s',x[1],x[2],x[1])
   ELSE  format(E'j->>\'%s\' AS %s',x[1],x[1])
  END, ',' ) as x2
FROM (
 SELECT  regexp_split_to_array(trim(x),'\s+') x
 FROM regexp_split_to_table('d date, t text, b boolean, i int, f float', ',') t1(x)
) t2;

... It's a "real life scenario", this (apparently ugly) model is surprisingly fast for small traffic applications. And other advantages, besides disk usage reduction: flexibility (you can change datataset schema without need of change in the SQL schema) and scalability (2, 3, ... 1 billion of different datasets on the same table).

Returning to the question: imagine a dataset with ~50 or more columns, the SQL VIEW will be faster if PostgreSQL offers a "bynary to bynary casting".

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • What exactly is your question? –  Jun 24 '20 at 11:27
  • Hi @a_horse_with_no_name, I was editing... Now a phrase in bolds – Peter Krauss Jun 24 '20 at 11:29
  • 1
    Well, that's the price you pay for de-normalizing I guess –  Jun 24 '20 at 11:31
  • @a_horse_with_no_name Hum... disagree on detail: the price is that it is not automatic, but the question is about "a programmer enforce the direct conversion". At least my right would remain: to create an **optimized** SQL VIEW for a **stable** JSONb schema. – Peter Krauss Jun 24 '20 at 11:34
  • 2
    If you can create a relational view on a JSONB schema that means (at least to me) that the step to de-normalize isn't actually needed and you could get rid of the JSON part completely. –  Jun 24 '20 at 11:36
  • @a_horse_with_no_name, I will disagree again, but it is not so simple to explain ... I edited again including a complex scenario to illustrate context and usage. The main gains with (de-normalized) JSONb is the flexibility (changes in the datataset schema with no need of change in the SQL schema), scalability (2, 3, .... 1 billion of different datasets in the same table) and disk usage reduction. – Peter Krauss Jun 24 '20 at 12:37
  • 1
    [This answer](https://stackoverflow.com/a/25002347/1048572) suggests that `jsonb_populate_record[set]` might skip the conversion to text. – Bergi Jun 24 '20 at 12:57
  • `imagine a dataset with 500 columns` is not a realistic scenario. – wildplasser Jun 24 '20 at 13:08
  • Hi @wildplasser, it is real on the Small-BigData joins (I am working with telecom big data)... But ok, ~50 columns is not so rare for datasets and have mensurable optiomization problem, see [this 55 columns contry-codes dataset](https://github.com/datasets/country-codes/blob/master/data/country-codes.csv) – Peter Krauss Jun 24 '20 at 13:21

1 Answers1

4

Short answer: No, there is no better way to extract a jsonb number as PostgreSQL than (for example)

CAST(j ->> 'attr' AS double precision)

A JSON number happens to be stored as PostgreSQL numeric internally, so that wouldn't work “directly” anyway. But there is no principal reason why there could not be a more efficient way to extract such a value as numeric.

So, why don't we have that?

  1. Nobody has implemented it. That is often an indication that nobody thought it worth the effort. I personally think that this would be a micro-optimization – if you want to go for maximum efficiency, you extract that column from the JSON and store it directly as column in the table.

    It is not necessary to modify the PostgreSQL source to do this. It is possible to write your own C function that does exactly what you envision. If many people thought this was beneficial, I'd expect that somebody would already have written such a function.

  2. PostgreSQL has just-in-time compilation (JIT). So if an expression like this is evaluated for a lot of rows, PostgreSQL will build executable code for that on the fly. That mitigates the inefficiency and makes it less necessary to have a special case for efficiency reasons.

  3. It might not be quite as easy as it seems for many data types. JSON standard types don't necessarily correspond to PostgreSQL types in all cases. That may seem contrived, but look at this recent thread in the Hackers mailing list that deals with the differences between the numeric types between JSON and PostgreSQL.

All of the above are not reasons that such a feature could never exist, I just wanted to give reasons why we don't have it.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Good news about [JIT](https://www.postgresql.org/docs/12/jit-reason.html) (in fact good [LLVM Compiler Infrastructure](https://llvm.org/)) and automatic [use-JIT decision](https://www.postgresql.org/docs/12/jit-decision.html)! ... Yes, only few "binary to binary" optimizations are possible, so the average gain in a *micro-optimized cast* can be low and difficult to estimate... Perhaps there is no way to do a "benefit-cost analysis" of the investment (into a C implementation), only by benchmarking process after implementation. Thanks also to find the link about recent thread discussion. – Peter Krauss Jun 24 '20 at 17:30
  • My guessing about the small gains in the "binary JSONb → binary SQL" conversions: good for *boolean→boolean*, *number→numeric*, *null→null::type*, *number→int*, *number→bigint*; and bad for *number→flloat*, *number→double*. – Peter Krauss Jun 24 '20 at 17:46
  • Converting `numeric` to any integer type cannot be cheap, as they are quite different. – Laurenz Albe Jun 25 '20 at 05:28
  • Hum... It is not easy to understand the [*jbvType*](https://doxygen.postgresql.org/jsonb_8h.html#a016055310b18a95fe223af00b4994bafa7f00d5d05aaab5eeb376035a7dc98d3d)... If we suppose that *jbvNumeric* is the "JSON binary value" for JSON number, seems it is not the same [internal SQL numeric](https://doxygen.postgresql.org/numeric_8h.html); and, yes, not seems to offer a direct conversion to `int4`. Do you have reliable links with the in-memory representations? – Peter Krauss Jun 28 '20 at 10:20
  • Another issue, testing PostgreSQL **v12**. Now we can do direct JSONb-to-SqlType conversions (!), but it is only a syntax sugar for string conversion? (my EXPLAIN ANALYSE say that is a sugar). Example: `SELECT (j->'i')::int,(j->>'i')::int, (j->'f')::float,(j->>'f')::float FROM (SELECT '{"i":123,"f":12.34}'::jsonb) t(j);` is working fine. Curiosily the NULL-to-SqlType not works, *"ERROR: cannot cast jsonb null to type integer"*. – Peter Krauss Jun 28 '20 at 10:20
  • For questions as these, you have to read the source. – Laurenz Albe Jun 29 '20 at 05:29
  • Seems that the optimization was implemented (!), see [this benchmark ("explode" case)](https://dba.stackexchange.com/a/271249/90651) using `LATERAL jsonb_populate_record()`. – Peter Krauss Jul 20 '20 at 00:59