0

I've been struggling for hours and I can't find why this query takes too long (> 60 minutes). All 4 tables have less than 50.000 records.

Also if I remove any table (gel6, gf6 or ger6) the query takes less than 500 ms to execute. What am I doing wrong?

Explain plan: https://explain.depesz.com/s/ldm2

 SELECT COUNT(*)
 FROM agroapp.ganado g
 INNER JOIN (SELECT gel5.ganado_id, gel5.estado_leche  
             FROM agroapp.ganado_estado_leche gel5  
             INNER JOIN (SELECT MAX(gel3.ganado_estado_leche_id) ganado_estado_leche_id  
                         FROM agroapp.ganado_estado_leche gel3  
                         INNER JOIN (SELECT gel.ganado_id, MAX(gel.created) created  
                                     FROM agroapp.ganado_estado_leche gel   
                                     GROUP BY gel.ganado_id) gel2 ON (gel2.ganado_id = gel3.ganado_id AND gel2.created = gel3.created)  
                         GROUP BY gel3.ganado_id) gel4 ON gel4.ganado_estado_leche_id = gel5.ganado_estado_leche_id  
            ) gel6 ON gel6.ganado_id = g.ganado_id
 INNER JOIN (SELECT gf5.ganado_id, gf5.fundo_id  
             FROM agroapp.ganado_fundo gf5  
             INNER JOIN (SELECT MAX(gf3.ganado_fundo_id) ganado_fundo_id  
                         FROM agroapp.ganado_fundo gf3  
                         INNER JOIN (SELECT gf.ganado_id, MAX(gf.created) created  
                                     FROM agroapp.ganado_fundo gf  
                                     GROUP BY gf.ganado_id) gf2 ON (gf2.ganado_id = gf3.ganado_id AND gf2.created = gf3.created)    
                         GROUP BY gf3.ganado_id) gf4 ON gf4.ganado_fundo_id = gf5.ganado_fundo_id  
            ) gf6 ON gf6.ganado_id = g.ganado_id
 INNER JOIN (SELECT ger5.ganado_id, ger5.estado_reproductivo  
             FROM agroapp.ganado_estado_reproductivo ger5  
             INNER JOIN (SELECT MAX(ger3.ganado_estado_reproductivo_id) ganado_estado_reproductivo_id  
                         FROM agroapp.ganado_estado_reproductivo ger3  
                         INNER JOIN (SELECT ger.ganado_id, MAX(ger.created) created  
                                     FROM agroapp.ganado_estado_reproductivo ger  
                                     GROUP BY ger.ganado_id) ger2 ON (ger2.ganado_id = ger3.ganado_id AND ger2.created = ger3.created)  
                         GROUP BY ger3.ganado_id) ger4 ON ger4.ganado_estado_reproductivo_id = ger5.ganado_estado_reproductivo_id  
            ) ger6 ON ger6.ganado_id = g.ganado_id
WHERE g.organizacion_id = 21

Tables

CREATE TABLE agroapp.ganado_estado_leche
(
  ganado_estado_leche_id serial NOT NULL,
  organizacion_id integer NOT NULL,
  isactive character(1) NOT NULL DEFAULT 'Y'::bpchar,
  created timestamp without time zone NOT NULL DEFAULT now(),
  createdby numeric(10,0) NOT NULL,
  updated timestamp without time zone NOT NULL DEFAULT now(),
  updatedby numeric(10,0) NOT NULL,
  estado_leche character varying(80) NOT NULL,
  ganado_id integer NOT NULL,
  fecha_manejo timestamp without time zone NOT NULL,
  CONSTRAINT ganado_estado_leche_pk PRIMARY KEY (ganado_estado_leche_id),
  CONSTRAINT ganado_fk FOREIGN KEY (ganado_id)
      REFERENCES agroapp.ganado (ganado_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

CREATE TABLE agroapp.ganado_fundo
(
  ganado_fundo_id serial NOT NULL,
  organizacion_id integer NOT NULL,
  isactive character(1) NOT NULL DEFAULT 'Y'::bpchar,
  created timestamp without time zone NOT NULL DEFAULT now(),
  createdby numeric(10,0) NOT NULL,
  updated timestamp without time zone NOT NULL DEFAULT now(),
  updatedby numeric(10,0) NOT NULL,
  fundo_id integer NOT NULL,
  ganado_id integer NOT NULL,
  CONSTRAINT ganado_fundo_pk PRIMARY KEY (ganado_fundo_id),
  CONSTRAINT ganado_fk FOREIGN KEY (ganado_id)
      REFERENCES agroapp.ganado (ganado_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

CREATE TABLE agroapp.ganado_estado_reproductivo
(
  ganado_estado_reproductivo_id serial NOT NULL,
  organizacion_id integer NOT NULL,
  isactive character(1) NOT NULL DEFAULT 'Y'::bpchar,
  created timestamp without time zone NOT NULL DEFAULT now(),
  createdby numeric(10,0) NOT NULL,
  updated timestamp without time zone NOT NULL DEFAULT now(),
  updatedby numeric(10,0) NOT NULL,
  estado_reproductivo character varying(80) NOT NULL,
  ganado_id integer NOT NULL,
  fecha_manejo timestamp without time zone NOT NULL,
  CONSTRAINT ganado_estado_reproductivo_pk PRIMARY KEY (ganado_estado_reproductivo_id),
  CONSTRAINT ganado_fk FOREIGN KEY (ganado_id)
      REFERENCES agroapp.ganado (ganado_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

CREATE TABLE agroapp.ganado
(
  ganado_id serial NOT NULL,
  organizacion_id integer NOT NULL,
  isactive character(1) NOT NULL DEFAULT 'Y'::bpchar,
  created timestamp without time zone NOT NULL DEFAULT now(),
  createdby numeric(10,0) NOT NULL,
  updated timestamp without time zone NOT NULL DEFAULT now(),
  updatedby numeric(10,0) NOT NULL,
  fecha_nacimiento timestamp without time zone NOT NULL,
  tipo_ganado character varying(80) NOT NULL,
  diio_id integer NOT NULL,
  fundo_id integer NOT NULL,
  raza_id integer NOT NULL,
  estado_reproductivo character varying(80) NOT NULL,
  estado_leche character varying(80),
  CONSTRAINT ganado_pk PRIMARY KEY (ganado_id),
  CONSTRAINT diio_fk FOREIGN KEY (diio_id)
      REFERENCES agroapp.diio (diio_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fundo_fk FOREIGN KEY (fundo_id)
      REFERENCES agroapp.fundo (fundo_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT raza_fk FOREIGN KEY (raza_id)
      REFERENCES agroapp.raza (raza_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
danielctf
  • 129
  • 2
  • 10
  • 1
    FYI joining on subqueries with lots of rows is not great for performance because there are no indexes on those subqueries, so the join must scan the whole subquery content. And most of your joins are on subqueries. – 404 Dec 04 '18 at 20:42
  • 2
    Please **[EDIT]** your question and add the execution plan generated using **`explain (analyze, buffers)`** not just a "simple" explain. –  Dec 04 '18 at 20:44
  • I will try to re-define the queries to not have subqueries. In the meantime, I'll execute the explain analyze. Shall take a few hours. – danielctf Dec 04 '18 at 20:50
  • Explain Analyze: https://explain.depesz.com/s/loGo – danielctf Dec 05 '18 at 00:05
  • Anyone?. I've redefined the queries using DISTINCT ON, sadly same results – danielctf Dec 07 '18 at 23:41
  • Essential information is missing. Please consider instructions in the tag [tag-info of \[postgresql-performance\]](https://stackoverflow.com/tags/postgresql-performance/info). – Erwin Brandstetter Dec 11 '18 at 01:02

1 Answers1

0

Table design

  • This looks very much like a boolean column (yes / no):

    isactive character(1) NOT NULL DEFAULT 'Y'::bpchar
    

    If so, replace with:

    isactive bool NOT NULL DEFAULT TRUE
    
  • If you might involve multiple times zones in any way, use timestamptz instead of timestamp here:

    created timestamp without time zone NOT NULL DEFAULT now(),
    

The default now() produces timestamptz and after the assignment cast results in the current time according to the time zone of the session. I.e., the value changes with the timezone of the session, which is a sneaky point of failure. See:
- Ignoring time zones altogether in Rails and PostgreSQL

And:

createdby numeric(10,0) NOT NULL

et al. look like they should really be just integer. (Or maybe bigint if you really think you might burn through more than 2147483648 numbers ...)

Query

Looking at the first subquery:

 SELECT gel5.ganado_id, gel5.estado_leche  
 FROM agroapp.ganado_estado_leche gel5  
 INNER JOIN (
    SELECT MAX(gel3.ganado_estado_leche_id) ganado_estado_leche_id  
    FROM agroapp.ganado_estado_leche gel3  
    INNER JOIN (
       SELECT gel.ganado_id, MAX(gel.created) created  
       FROM agroapp.ganado_estado_leche gel   
       GROUP BY gel.ganado_id
       ) gel2 ON (gel2.ganado_id = gel3.ganado_id AND gel2.created = gel3.created)  
    GROUP BY gel3.ganado_id
    ) gel4 ON gel4.ganado_estado_leche_id = gel5.ganado_estado_leche_id

The innermost subquery gets the max. created per ganado_id, the next one the max ganado_estado_leche_id of those rows. And finally you join back and retrieve all ganado_id that appear in combination with the identified max ganado_estado_leche_id per partition. I have a hard time making sense of this, but it can be simplified to:

SELECT gel2.ganado_id
FROM   agroapp.ganado_estado_leche gel2
JOIN  (
   SELECT DISTINCT ON (ganado_id) ganado_estado_leche_id
   FROM   agroapp.ganado_estado_leche
   ORDER  BY ganado_id, created DESC NULLS LAST, ganado_estado_leche_id DESC NULLS LAST
   ) gel1 USING (ganado_estado_leche_id)

See:

Looks like an incorrect query to me. Same with the rest of the query: the joins multiply rows in an odd fashion. Not sure what you are trying to count, but I doubt the query counts just that. You did not provide enough information to make sense of it.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228