2

I'm running a simple query that retrieves data from a single table. If I only look up non-JSON fields that are in the table the query takes 16 ms. If I include fields that references fields in JSONB data then it increases by 62x. If I look up two different JSONB fields then double that.

--EXPLAIN (ANALYZE,buffers)
SELECT 
  segment as segment_no,
  begin_time,
  segment_data::json->'summary'->'begin_milage' as begin_milage,
  segment_data::json->'summary'->'end_milage' as end_milage
FROM
  segments_table
WHERE
  vehicle=12 AND trip=3
ORDER BY
  begin_time;

The query takes 2.0 seconds with the two JSON fields included in SELECT clause. If omitting one it takes 1.0 secs, if omitting both JSON fields then the query only takes 16 ms.

The table itself has about 700 records. The query returns 83 records. Running different queries I notice that the more records are returned the longer it takes for the query to complete (approximately 0.0066 * X1.32 ms) when querying 2 JSON fields.

I tried adding an index for the vehicle and trip lookup but this did not make much difference (as expected). It seems it is the actual retrieval of the data and finding the data in the JSONB field is what takes time. Now, had the JSON fields been needed in the WHERE clause it would have been more understandable to see this kind of degradation but this is not the case.

A simple solution would of course be to pull each field out of the JSON blob and create separate fields for this in the table itself. But before I go that route is there anything else that could fix this performance issue?

Here are the results for ANALYZE:

Sort  (cost=13.25..13.27 rows=10 width=28) (actual time=1999.899..1999.901 rows=71 loops=1)
  Sort Key: begin_time
  Sort Method: quicksort  Memory: 35kB
  Buffers: shared hit=5663
  ->  Bitmap Heap Scan on segments_table  (cost=4.38..13.08 rows=10 width=28) (actual time=1.332..1999.730 rows=71 loops=1)
        Recheck Cond: ((vehicle = 644) AND (trip = 3))
        Heap Blocks: exact=3
        Buffers: shared hit=5663
        ->  Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq  (cost=0.00..4.38 rows=10 width=0) (actual time=0.052..0.052 rows=71 loops=1)
              Index Cond: ((vehicle = 644) AND (trip = 3))
              Buffers: shared hit=2
Planning time: 0.368 ms
Execution time: 2000.000 ms

Another interesting observation is that running the same query multiple times I see no improvement to caching that I would expect to take place on subsequent identical queries.

The only modifications I have made to the stock postgres server config is increasing the shared_buffers from 128MB to 256MB and setting the effective_cache_size = 1GB. I also reduced the max_connections from 100 to 20.

The above results are running under Win7 on an 8-core i7 processor. Also did the same test under Ubuntu on a dual core CPU and the query took about the same: 2.2 seconds (when including the two JSONB fields in SELECT clause).


Update:

Single JSON field in SELECT clasuse:

EXPLAIN (ANALYZE,buffers)
SELECT 
  segment as segment_no,
  begin_time, 
  segment_data::json->'summary'->'end_mileage' as end_mileage
FROM
  segments_table
WHERE
  vehicle=644 AND trip=3
ORDER BY
  begin_time;

Result:

Sort  (cost=13.15..13.17 rows=10 width=28) (actual time=999.695..999.696 rows=71 loops=1)
  Sort Key: begin_time
  Sort Method: quicksort  Memory: 26kB
  Buffers: shared hit=2834
  ->  Bitmap Heap Scan on segments_table  (cost=4.38..12.98 rows=10 width=28) (actual time=0.781..999.554 rows=71 loops=1)
        Recheck Cond: ((vehicle = 644) AND (trip = 3))
        Heap Blocks: exact=3
        Buffers: shared hit=2834
        ->  Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq  (cost=0.00..4.38 rows=10 width=0) (actual time=0.052..0.052 rows=71 loops=1)
              Index Cond: ((vehicle = 644) AND (trip = 3))
              Buffers: shared hit=2
Planning time: 0.353 ms
Execution time: 999.777 ms

No JSON field in SELECT clause:

EXPLAIN (ANALYZE,buffers)
SELECT 
  segment as segment_no,
  begin_time
FROM
  segments_table
WHERE
  vehicle=644 AND trip=3
ORDER BY
  begin_time;

Result:

Sort  (cost=13.05..13.07 rows=10 width=10) (actual time=0.194..0.205 rows=71 loops=1)
  Sort Key: begin_time
  Sort Method: quicksort  Memory: 19kB
  Buffers: shared hit=5
  ->  Bitmap Heap Scan on segments_table  (cost=4.38..12.88 rows=10 width=10) (actual time=0.088..0.122 rows=71 loops=1)
        Recheck Cond: ((vehicle = 644) AND (trip = 3))
        Heap Blocks: exact=3
        Buffers: shared hit=5
        ->  Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq  (cost=0.00..4.38 rows=10 width=0) (actual time=0.048..0.048 rows=71 loops=1)
              Index Cond: ((vehicle = 644) AND (trip = 3))
              Buffers: shared hit=2
Planning time: 0.590 ms
Execution time: 0.280 ms

Table definition:

CREATE TABLE public.segments_table
(
  segment_id integer NOT NULL DEFAULT nextval('segments_table_segment_id_seq'::regclass),
  vehicle smallint NOT NULL,
  trip smallint NOT NULL,
  segment smallint NOT NULL,
  begin_time timestamp without time zone NOT NULL,
  segment_data jsonb,
  CONSTRAINT segments_table_pkey PRIMARY KEY (segment_id),
  CONSTRAINT segments_table_vehicle_64df5bc5_uniq UNIQUE (vehicle, trip, segment, begin_time)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX segments
  ON public.segments_table
  USING btree
  (segment);

CREATE INDEX vehicles
  ON public.segments_table
  USING btree
  (vehicle);

CREATE INDEX trips
  ON public.segments_table
  USING btree
  (trip);

Update #2:

Fixing casting issue as @Mark_M pointed out, changing json to jsonb` reduces the query time from 2 s to 300 ms:

EXPLAIN (ANALYZE,buffers)
SELECT 
  segment as segment_no,
  begin_time,
  segment_data::jsonb->'summary'->'begin_mileage' as begin_mileage,
  segment_data::jsonb->'summary'->'end_mileage' as end_mileage
FROM
  segments_table
WHERE
  vehicle=644 AND trip=3
ORDER BY
  begin_time;

  Sort  (cost=13.15..13.17 rows=10 width=28) (actual time=296.339..296.342 rows=71 loops=1)
  Sort Key: begin_time
  Sort Method: quicksort  Memory: 35kB
  Buffers: shared hit=5663
  ->  Bitmap Heap Scan on segments_table  (cost=4.38..12.98 rows=10 width=28) (actual time=0.275..296.229 rows=71 loops=1)
        Recheck Cond: ((vehicle = 644) AND (trip = 3))
        Heap Blocks: exact=3
        Buffers: shared hit=5663
        ->  Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq  (cost=0.00..4.38 rows=10 width=0) (actual time=0.045..0.045 rows=71 loops=1)
              Index Cond: ((vehicle = 644) AND (trip = 3))
              Buffers: shared hit=2
Planning time: 0.352 ms
Execution time: 296.473 ms

That improved quite a bit, but still 18x just looking up using non-JSON fields, but this much better. Is this a reasonable performance hit for using JSONB field?

kashiraja
  • 740
  • 11
  • 24
  • 2
    It looks like you are selecting jsonb types and casting them as json. Does it improve when you just use `segment_data->'summary'->'begin_milage'` – Mark Sep 20 '17 at 18:43
  • Is segment, begin_time, vehicle and trip in an index by any chance? – Joe Love Sep 20 '17 at 19:35
  • no, they are not indexed explicityly, but perhaps `segments_table_vehicle_64df5bc5_uniq` is internally? – kashiraja Sep 20 '17 at 19:57
  • That's a good improvement with the cast change. I'm not sure why it would be slow now, but since you know the tables are jsonb, you might try not casting at all as in my original comment. `segment_data->'summary'->'begin_mileage` I think it's a long shot, but maybe casting jsonb to jsonb is forcing an unneeded operation. – Mark Sep 20 '17 at 20:36
  • @Mark_M correct, the `::jsonb` is not needed. It seems it is just ignored as the query time is still the same. – kashiraja Sep 20 '17 at 20:54
  • 1
    The major difference is: `Buffers: shared hit=5663` vs. `Buffers: shared hit=5`. When you request the JSON column, Postgres needs to read a **lot** more data then without it. One buffer is 8kB –  Sep 20 '17 at 21:09
  • For some reason Postgres is sorting the json data even if it's not the target of the sort. I have a very similar problem (pg 10.4) and omitting the `ORDER BY` improves the query time a lot. I'm probably doing it wrong, but for lack of time will resort to this workaround; first get the ids with sorted query, then another select for the JSON – MarkM Feb 12 '19 at 12:53

0 Answers0