0

I am running the following query:

SELECT fat.*   
FROM Table1 fat  
LEFT JOIN modo_captura mc       ON mc.id = fat.modo_captura_id  
INNER JOIN loja lj              ON lj.id = fat.loja_id  
INNER JOIN rede rd              ON rd.id = fat.rede_id  
INNER JOIN bandeira bd          ON bd.id = fat.bandeira_id  
INNER JOIN produto pd           ON pd.id = fat.produto_id  
INNER JOIN loja_extensao le     ON le.id = fat.loja_extensao_id  
INNER JOIN conta ct             ON ct.id = fat.conta_id
INNER JOIN banco bc             ON bc.id = ct.banco_id  
LEFT JOIN conciliacao_vendas cv ON fat.empresa_id = cv.empresa_id AND cv.chavefato = fat.chavefato AND fat.rede_id = cv.rede_id  
WHERE 1 = 1  
AND cv.controle_upload_arquivo_id = 6906  
AND fat.parcela = 1  
ORDER BY fat.data_venda, fat.data_credito limit 20

But very slowly. Here the Explain plan: http://explain.depesz.com/s/DnXH

Tiago Rolim
  • 65
  • 1
  • 11
  • Your version of Postgres? Basic information on the purpose of the query and your indexes? Which of the WHERE conditions can change and how? – Erwin Brandstetter Jul 30 '14 at 18:22
  • Depending on how postgres handles join criteria and left joins, you may get improved performance by moving the fat.parcela to the initial left join. Why? because the filter is applied before the cartesean is generated as opposed to after. so if fat.parcela = 1 takes 1,000,000 records to 1 you reduce overhead SIGNIFICANTLY. the few rows involved the fewer rows it has to materialize/combine before the where clause is executed. so... `ON mc.id = fat.modo_captura_id AND fat.parcela = 1` may be much faster... It just really depends on HOW the engine generates the results. – xQbert Jul 30 '14 at 18:30
  • @xQbert: As it happens, we discussed that just yesterday under this related question: http://stackoverflow.com/questions/25002263/sample-query-to-show-cardinality-estimation-error-in-postgresql/25002962#25002962. Basically, Postgres freely arranges tables as long as they are joined by `INNER JOIN, CROSS JOIN, or unadorned JOIN`. A `LEFT JOIN` limits the planner in his choices. – Erwin Brandstetter Jul 30 '14 at 18:44
  • Interesting link. I'm still curious though if by moving the limiting criteria to the join if the engine will reduce the size of the cartesean being generated and thus (with this many joins) likely result in a performance gain. I'll look into that later when I have more free time. I'm not sure what limits a Left join has in postgresql (does the system always process those last when generating the result set? would moving the limit to the join result in better performance or is the engine smart enough to do that? I've seen it happen both ways in other RDBMS not sure what would happen here. – xQbert Jul 30 '14 at 18:59

1 Answers1

1

Try this rewritten version:

SELECT fat.*   
FROM   Table1 fat
JOIN   conciliacao_vendas cv USING (empresa_id, chavefato, rede_id)
JOIN   loja lj               ON lj.id = fat.loja_id  
JOIN   rede rd               ON rd.id = fat.rede_id  
JOIN   bandeira bd           ON bd.id = fat.bandeira_id  
JOIN   produto pd            ON pd.id = fat.produto_id  
JOIN   loja_extensao le      ON le.id = fat.loja_extensao_id  
JOIN   conta ct              ON ct.id = fat.conta_id
JOIN   banco bc              ON bc.id = ct.banco_id
LEFT   JOIN modo_captura mc  ON mc.id = fat.modo_captura_id  
WHERE  cv.controle_upload_arquivo_id = 6906  
AND    fat.parcela = 1  
ORDER  BY fat.data_venda, fat.data_credito
LIMIT  20;

JOIN syntax and sequence of joins

In particular I fixed the misleading LEFT JOIN to conciliacao_vendas, which is forced to act as a plain [INNER] JOIN by the later WHERE condition anyways. This should simplify query planning and allow to eliminate rows earlier in the process, which should make everything a lot cheaper. Related answer with detailed explanation:

USING is just a syntactical shorthand.

Since there are many tables involved in the query and the order the rewritten query joins tables is optimal now, you can fine-tune this with SET LOCAL join_collapse_limit = 1 to save planning overhead and avoid inferior query plans. Run in a single transaction:

BEGIN;
SET LOCAL join_collapse_limit = 1;
SELECT ...;  -- read data here
COMMIT;      -- or ROOLBACK;

More about that:

Index

Add some indexes on lookup tables with lots or rows (not necessary for just a couple of dozens), in particular (taken from your query plan):

Seq Scan on public.conta ct ... rows=6771
Seq Scan on public.loja lj ... rows=1568
Seq Scan on public.loja_extensao le ... rows=16394

That's particularly odd, because those columns look like primary key columns and should already have an index ...

So:

CREATE INDEX conta_pkey_idx ON public.conta (id);
CREATE INDEX loja_pkey_idx ON public.loja (id);
CREATE INDEX loja_extensao_pkey_idx ON public.loja_extensao (id);

To make this really fat, a multicolumn index would be of great service:

CREATE INDEX foo ON Table1 (parcela, data_venda, data_credito);
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • How do you know the left join was misleading? Maybe the constraint should have been placed on the join as the OP may want all records from table1 and only those that match in conciliacao_vendas.... – xQbert Jul 30 '14 at 18:35
  • @xQbert: My query returns the *same* result as the original - probably faster, though. Detailed rationale can be found in the linked answer. – Erwin Brandstetter Jul 30 '14 at 18:37
  • I'm not disagreeing with your answer. I simply fail to see how you came to the conclusion the left join was what was misleading when it could just as easily be that the where clause is wrong and the current results are also incorrect. If someone knows to use a LEFT JOIN, perhaps they simply don't know how to handle limiting criteria when a LEFT JOIN is used...But I digress it's an OP issue... I really like your update (no sarcasm), "To make this really **FAT** a multicolumn index would be of great service):" – xQbert Jul 30 '14 at 18:51
  • @xQbert: Oh, I meant "fast", but "fat" serves nicely. :) – Erwin Brandstetter Jul 30 '14 at 18:52