-1

So i happened to look at a query created within Django ORM and I am curious as to how it will execute Here is the query.

SELECT "workflowengine_formdata"."id", "workflowengine_formdata"."formfield_id", "workflowengine_formdata"."user_id", "workflowengine_formdata"."flow_id", "workflowengine_formdata"."file", "workflowengine_formdata"."text", "workflowengine_formdata"."created_at", "workflowengine_formdata"."updated_at" 
FROM "workflowengine_formdata" 
  INNER JOIN "workflowengine_formfield" 
          ON ("workflowengine_formdata"."formfield_id" = "workflowengine_formfield"."id") 
WHERE ("workflowengine_formdata"."flow_id" = 19a2e198-3731-414e-b247-c2cd08e659df 
   AND "workflowengine_formfield"."stage_id" = 1 
   AND "workflowengine_formfield"."expert_search_criteria" = True)

I am wondering, when the query does get executed what gets executed first? Does the part of the where clause get executed

("workflowengine_formdata"."flow_id" = 19a2e198-3731-414e-b247-c2cd08e659df 
  AND "workflowengine_formfield"."stage_id" = 1 
  AND "workflowengine_formfield"."expert_search_criteria" = True)

or the on

 "workflowengine_formdata"."formfield_id" = "workflowengine_formfield"."id"

I think the where clauses get executed first, because that will reduce the number of rows that will participate in the join. I think the "ON" part of the query will basically join a lot of rows which is counterproductive.

Rupin
  • 610
  • 7
  • 23
  • 2
    all DB engines will convert the SQL into a query plan, which is a step by step explanation of how the DB engine will get your data. It will use many different factors to determine the "best" way to fetch the data you asked for. Take the query you got from the ORM, and follow the steps in this article to give you a better idea. https://thoughtbot.com/blog/reading-an-explain-analyze-query-plan – Jeremy Sep 12 '19 at 16:54
  • 1
    See a relational DB textbooks re query implementation/optimization, the DBMS manual re execution, optimization & indexes. There are many books. Also this is a faq. Before considering posting please always google your error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags, & read many answers. If you post a question, use one phrasing as title. See [ask] & the voting arrow mouseover texts. We cannot reason, communicate or search unless we make the effort to (re-re-re-)write clearly. – philipxy Sep 13 '19 at 01:15
  • Will the explain analyze still work if there was no data in the tables? – Rupin Sep 13 '19 at 06:17
  • You must believe I have done that research before I posted here, because I have been burnt many times on questions like these where there was no straight answer available for them. – Rupin Sep 13 '19 at 06:23
  • Follow my comment with perseverence in the future. That includes, read & search the manuals for your tools & textbooks for your specialties. (Sadly "rtfm" is a SO dirty word even though it is the most useful thing askers could know.) (My 1st hit googling 'site:postgresql.org/ optimization' is [an online Oracle "Performance Optimization" wiki](https://wiki.postgresql.org/wiki/Performance_Optimization). Note that the SO/SE search is poor. I wonder how you researched that you couldn't easily find answers at various levels of detail. PS Google SE notifications re @ in comments. – philipxy Sep 14 '19 at 03:18

3 Answers3

1

You could easily check it with explain:

explain SELECT ...;

Query optimizer could rewrite the query depending on statistics, available indexes and many other factors. As a rule of thumb yes, it will first reduce the number of rows and then perform join.

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • My database tables currently do not have any rows, Will this still work? – Rupin Sep 13 '19 at 06:14
  • @Rupin The query plan differs depending on statistics(data distribution and so on). So you may get completely different one for empty table – Lukasz Szozda Sep 13 '19 at 15:52
0

You can check this link I think : https://sqlbolt.com/lesson/select_queries_order_of_execution

Like it is described : the first thing that is normally done by the engine is to define the target, after that it will reduce this first dataset page after page. (in my comprehension, maybe I am wrong).

But as already told : you can either check the query plan.

Jaisus
  • 1,019
  • 5
  • 14
  • It is interesting how the database works below the hood. I don't have data in my table yet, but i will have to come back and do this all over again when It does. – Rupin Sep 13 '19 at 06:19
0

For an inner join (but not an outer join) the distinction between the ON and the WHERE is purely textual/visual. The database planner will analyze the query as if it were all in the WHERE. There is no meaningful "ordering" between them.

jjanes
  • 37,812
  • 5
  • 27
  • 34
  • But if it is all in the where clause, how will developers know which clause to put first to reduce the extract number of rows thereby improving query performance? – Rupin Sep 13 '19 at 06:08
  • The planner is extremely complicated. We are not talking about 15 lines of code here. It has lots of capabilities built into it. It will consider hash joins, merge joins, and nested loop joins, and it has a cost model for each based on how many rows it thinks it will find as input and output, and what indexes you have. – jjanes Sep 13 '19 at 12:50