1

I struggle with optimizing the performance of my table with vertical partitioning. The following Select Statement should be more optimized from postgre imho:

SELECT 
  "ProductView".name, 
  "ProductView".price, 
  "ProductView".pid
FROM 
  "DefaultSchema"."ProductView";

My Schema looks like this:

tables:
ProductA(**pid**, name, price)
ProductB(**pid**, desc)
view:
Product(**pid**,name, price, desc)

The SQL:

CREATE TABLE "DefaultSchema"."ProductA"
(
  pid integer NOT NULL,
  price integer,
  name text,
  CONSTRAINT pk_pa PRIMARY KEY (pid)
)

CREATE TABLE "DefaultSchema"."ProductB"
(
  pid integer NOT NULL,
  "desc" text,
  CONSTRAINT "PK_PB" PRIMARY KEY (pid),
  CONSTRAINT "FK_PID" FOREIGN KEY (pid)
      REFERENCES "DefaultSchema"."ProductA" (pid) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE
)

CREATE OR REPLACE VIEW "DefaultSchema"."ProductView" AS 
 SELECT p1.pid,
    p1.price,
    p1.name,
    p2."desc"
   FROM "DefaultSchema"."ProductA" p1
     JOIN "DefaultSchema"."ProductB" p2 ON p1.pid = p2.pid;

So you might recognise I do not really need ProductB for the select query. Nevertheless it is joined during the process of execution, as you can see here.

"Hash Join  (cost=36.10..74.61 rows=1160 width=40) (actual time=0.090..0.105 rows=7 loops=1)"
"  Hash Cond: (p2.pid = p1.pid)"
"  ->  Seq Scan on "ProductB" p2  (cost=0.00..22.30 rows=1230 width=4) (actual time=0.022..0.027 rows=7 loops=1)"
"  ->  Hash  (cost=21.60..21.60 rows=1160 width=40) (actual time=0.030..0.030 rows=7 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"        ->  Seq Scan on "ProductA" p1  (cost=0.00..21.60 rows=1160 width=40) (actual time=0.010..0.017 rows=7 loops=1)"
"Total runtime: 0.299 ms"

My Question is how can I force postgre to scan only ProductA? Do I need an addional constraint, edit the config file or is it not possible to gain a performance advantage through vertical partitioning in postgre? Thanks a lot in advance. :)

Rick
  • 2,080
  • 14
  • 27
  • 1
    "Total runtime: 0.299 ms", how much faster should it be? This is next to nothing, hard to improve. 1 millisecond is fast, but 0.299 millisecond, that's really fast, faster than most of us can blink with our eyes... – Frank Heikens Nov 05 '14 at 15:53
  • The total time is irrelevant, because in this test scenario I only had 7 records. The interesting point is how the DBMS optimizes the query in a semantic way. If I imagine I have one million records it make a huge different if I only scan one table or if I scan two tables and join them. – Rick Nov 06 '14 at 10:45

1 Answers1

1

PostgreSQL's query planner does not yet do join removal on inner joins.

You can either query "ProductA" alone, or rewrite the view to use a left outer join. PostgreSQL 9.0+ does do join removal on left outer joins.

CREATE OR REPLACE VIEW "DefaultSchema"."ProductView" AS 
 SELECT p1.pid,
    p1.price,
    p1.name,
    p2."desc"
   FROM "DefaultSchema"."ProductA" p1
   LEFT JOIN "DefaultSchema"."ProductB" p2 ON p1.pid = p2.pid;

explain analyze
SELECT "ProductView".name, "ProductView".price, "ProductView".pid
FROM "ProductView";
QUERY PLAN
--
Seq Scan on "ProductA" p1  (cost=0.00..20.00 rows=1000 width=41) (actual time=0.008..0.225 rows=1000 loops=1)

Rewriting to use a left outer join isn't safe in every application, but I think it's safe for your particular problem.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • Thanks a lot Mike, you absolutely hit the point! It kind of makes sense now. I do not think it should be possible to optimize an inner join without touching the table. From an logical point of view I do not know the result of an inner join for sure, so I cannot optimize it. – Rick Nov 06 '14 at 10:40