5

I've set up Postgres 9.6 and checked on a large table of random integers that parallel queries are working. However, a simple XPath query on an XML column of another table is always sequential. Both XPath functions are marked as parallel safe in Postgres. I tried to alter XPath cost, so the expected cost skyrocketed, but it didn't change anything. What am I missing?

Example table DDL: CREATE TABLE "test_table" ("xml" XML );

Example query: SELECT xpath('/a', "xml") FROM "test_table";

Example data: <a></a>. Note that real data contains XMLs that are 10-1000kB in size.

> select pg_size_pretty(pg_total_relation_size('test_table'));
28 MB

> explain (analyze, verbose, buffers) select xpath('/a', "xml") from test_table;
Seq Scan on public.test_table  (cost=0.00..64042.60 rows=2560 width=32) (actual time=1.420..4527.061 rows=2560 loops=1)
  Output: xpath('/a'::text, xml, '{}'::text[])
  Buffers: shared hit=10588
Planning time: 0.058 ms
Execution time: 4529.503 ms
Eugene Pakhomov
  • 9,309
  • 3
  • 27
  • 53
  • Please post the query you are running, and preferably the table structure (which can be simplified) and some sample data too. -- Note that 1 function (which is not parallel safe) is enough for opting out parallelism for the whole query. – pozs Jun 07 '17 at 12:11
  • Well, it can't get any simpler than that, but I added it as you asked. – Eugene Pakhomov Jun 07 '17 at 23:34
  • 1
    Please **[EDIT]** your question and add the execution plan generated using **`explain (analyze, verbose)`**. [**Formatted text**](http://stackoverflow.com/help/formatting) please, [no screen shots](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557) –  Jun 08 '17 at 05:42
  • Added execution plan – Eugene Pakhomov Jun 08 '17 at 17:57
  • 1
    The table might be too small to be considered for a parallel seq scan. The [default minimum size](https://www.postgresql.org/docs/current/static/runtime-config-query.html) is 8MB. However, 3 seconds for a seq scan of a table with just 2500 rows is way too slow. Is that an extremely wide table? i.e. does it have a lot of columns? Using `explain (analyze, buffers)` might give a hint for that –  Jun 08 '17 at 20:49
  • Updated the question. As you see, the size exceeds 8MB, and I guess the long time to run the query is explained by the size of XML documents. The `test_table` contains only one column, just as the question states. – Eugene Pakhomov Jun 10 '17 at 07:20
  • 1
    Try dropping the value of `parallel_tuple_cost` and `parallel_setup_cost`. Setting them to zero should make the planner select the parallel plan. Time your queries to see if the planner really is getting it wrong. In my experience `parallel_tuple_cost` can blow out the estimates of queries that return a lot of rows because the default value (0.1) is too high. – teppic Jun 21 '17 at 08:15

1 Answers1

4

The relevant point here is likely the distinction between "relation size" and "total relation size":

CREATE TABLE test_table AS
  SELECT ('<a>' || repeat('x', 1000000) || '</a>')::xml AS "xml"
  FROM generate_series(1, 2560);

SELECT
  pg_size_pretty(pg_relation_size('test_table')) AS relation_size,
  pg_size_pretty(pg_total_relation_size('test_table')) AS total_relation_size;

 relation_size | total_relation_size
---------------+---------------------
 136 kB        | 30 MB

Large column values like these are not stored within the main relation, but instead are pushed to its associated TOAST table. This external storage does not count towards pg_relation_size(), which is what the optimiser appears to be comparing against min_parallel_relation_size when evaluating a parallel plan:

SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;
SET min_parallel_relation_size = '144kB';
EXPLAIN SELECT xpath('/a', "xml") FROM test_table;

                          QUERY PLAN
---------------------------------------------------------------
 Seq Scan on test_table  (cost=0.00..49.00 rows=2560 width=32)
SET min_parallel_relation_size = '136kB';
EXPLAIN SELECT xpath('/a', "xml") FROM test_table;

     QUERY PLAN
------------------------------------------------------------------------------
 Gather  (cost=0.00..38.46 rows=2560 width=32)
   Workers Planned: 1
   ->  Parallel Seq Scan on test_table  (cost=0.00..35.82 rows=1506 width=32)
Nick Barnes
  • 19,816
  • 3
  • 51
  • 63
  • Strange - I repeated your steps exactly, and yet the final query plan was just the same as the previous one, sequential. Also, it's rather strange that your query plan has `Workers Planned: 1`. Are you sure that you haven't accidentally set `force_parallel_mode` somewhere between running two last or the provided queries? – Eugene Pakhomov Jun 23 '17 at 22:07
  • Please disregard my comment about `force_parallel_mode` - I have just 1 worker as well when `relation_size` is close to `min_parallel_relation_size`. To add more info: parallel queries are definitely working on my setup. When using a table with the real data, a simple query such as `SELECT char_length("name") FROM real_table` is being run in parallel after your `SET ...` queries. But running `SELECT xpath('/a', "xml") FROM real_table` still results in a sequential query. For what it's worth, the Postgres version is 9.6.3. – Eugene Pakhomov Jun 23 '17 at 22:14
  • Ah, that's just me being stupid. When testing things, I set cost of `xpath(text, xml, text[])` to 10000, but I remembered to reset cost to 1 only for `xpath(text, xml)`. And now it was preventing Postgres from using parallel plan for some reason. Apparently, I don't understand how cost works at all, but that's another matter. Thank you for your great answer! – Eugene Pakhomov Jun 23 '17 at 22:25