1

I have following tables:

CREATE TABLE person (
  id INTEGER NOT NULL,
  name TEXT,
  CONSTRAINT person_pkey PRIMARY KEY(id)
);

INSERT INTO person ("id", "name")
VALUES 
  (1, E'Person1'),
  (2, E'Person2'),
  (3, E'Person3'),
  (4, E'Person4'),
  (5, E'Person5'),
  (6, E'Person6');

CREATE TABLE person_book (
  id INTEGER NOT NULL,
  person_id INTEGER,
  book_id INTEGER,
  receive_date DATE,
  expire_date DATE,
  CONSTRAINT person_book_pkey PRIMARY KEY(id)
);

/* Data for the 'person_book' table  (Records 1 - 9) */

INSERT INTO person_book ("id", "person_id", "book_id", "receive_date", "expire_date")
VALUES 
  (1, 1,  1, E'2016-01-18', NULL),
  (2, 1,  2, E'2016-02-18', E'2016-10-18'),
  (3, 1,  4, E'2016-03-18', E'2016-12-18'),
  (4, 2,  3, E'2017-02-18', NULL),
  (5, 3,  5, E'2015-02-18', E'2016-02-23'),
  (6, 4, 34, E'2016-12-18', E'2018-02-18'),
  (7, 5, 56, E'2016-12-28', NULL),
  (8, 5, 34, E'2018-01-19', E'2018-10-09'),
  (9, 5, 57, E'2018-06-09', E'2018-10-09');

CREATE TABLE book (
  id INTEGER NOT NULL,
  type TEXT,
  CONSTRAINT book_pkey PRIMARY KEY(id)
) ;

/* Data for the 'book' table  (Records 1 - 8) */

INSERT INTO book ("id", "type")
VALUES 
  ( 1, E'Btype1'),
  ( 2, E'Btype2'),
  ( 3, E'Btype3'),
  ( 4, E'Btype4'),
  ( 5, E'Btype5'),
  (34, E'Btype34'),
  (56, E'Btype56'),
  (67, E'Btype67');

My query should list name of all persons and for persons with recently received book types of (book_id IN (2, 4, 34, 56, 67)), it should display the book type and expire date; if a person hasn’t received such book type it should display blank as book type and expire date.

My query looks like this:

SELECT p.name,
   pb.expire_date,
   b.type
   FROM 
   (SELECT p.id AS person_id, MAX(pb.receive_date) recent_date
    FROM 
        Person p
        JOIN person_book pb ON pb.person_id = p.id
    WHERE pb.book_id IN (2, 4, 34, 56, 67)
    GROUP BY p.id
   )tmp 
   JOIN person_book pb ON pb.person_id = tmp.person_id
   AND tmp.recent_date = pb.receive_date AND pb.book_id IN 
   (2, 4, 34, 56, 67)
   JOIN book b ON b.id = pb.book_id           
   RIGHT JOIN Person p ON p.id = pb.person_id 

The (correct) result:

  name   | expire_date |  type
---------+-------------+---------
 Person1 | 2016-12-18  | Btype4
 Person2 |             |
 Person3 |             |
 Person4 | 2018-02-18  | Btype34
 Person5 | 2018-10-09  | Btype34
 Person6 |             |

The query works fine but since I'm right joining a small table with a huge one, it's slow. Is there any efficient way of rewriting this query?

My local PostgreSQL version is 9.3.18; but the query should work on version 8.4 as well since that's our productions version.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Ms workaholic
  • 373
  • 2
  • 8
  • 21

1 Answers1

1

Problems with your setup

My local PostgreSQL version is 9.3.18; but the query should work on version 8.4 as well since that's our productions version.

That makes two major problems before even looking at the query:

  1. Postgres 8.4 is just too old. Especially for "production". It has reached EOL in July 2014. No more security upgrades, hopelessly outdated. Urgently consider upgrading to a current version.

  2. It's a loaded footgun to use very different versions for development and production. Confusion and errors that go undetected. We have seen more than one desperate request here on SO stemming from this folly.

Better query

This equivalent should be substantially simpler and faster (works in pg 8.4, too):

SELECT p.name, pb.expire_date, b.type
FROM  (
   SELECT DISTINCT ON (person_id)
          person_id, book_id, expire_date
   FROM   person_book
   WHERE  book_id IN (2, 4, 34, 56, 67)
   ORDER  BY person_id, receive_date DESC NULLS LAST
   ) pb
JOIN   book        b ON b.id = pb.book_id
RIGHT  JOIN person p ON p.id = pb.person_id;

To optimize read performance, this partial multicolumn index with matching sort order would be perfect:

CREATE INDEX ON person_book (person_id, receive_date DESC NULLS LAST)
WHERE  book_id IN (2, 4, 34, 56, 67);

In modern Postgres versions (9.2 or later) you might append book_id, expire_date to the index columns to get index-only scans. See:

About DISTINCT ON:

About DESC NULLS LAST:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Hi, You're right I had mistypes in my query; I really like your posts, always helpful; So I appreciate it if you take a look and let me know your suggestion – Ms workaholic Jan 10 '18 at 16:25
  • You are right about the issues of having two different versions, especially that I can't use windows functions in the production version while they're so needed in my queries, but unfortunately I don't have any authority to resolve that. I have talked to them a month ago, hopefully they take a step forward. By the way, very helpful answer, as usual, thank you so much – Ms workaholic Jan 11 '18 at 16:10