0

Ok this is slightly theoretical so it would be great if an unbiased database enthusiast gave an opinion.

For the sake of argument let's agree that there is such a concept as a "base table" w.r.t. to a query, where one table is driving the majority of information of the result set. Imagine a query where there are three relations - TableA, TableB, and TableC

Let's say TableA has cardinality of 1 million records and TableC has 500 records and TableC has 10,000.

Let's say the query is like so -

SELECT A.Col1
     , A.Col2
     , A.Col3
     , A.Col4
     , A.Col5
FROM TableA A
 LEFT JOIN TableB B ON B.ID = A.TableBID
 LEFT JOIN TableC C ON C.ID = A.TableCID

Ok, clearly TableA is the base relation above. It is the biggest table, it is driving the result set by being joined "from", and visually the columns are even on the "left side" of the result set. (The left side thing actually was a criterion to my colleague).

Now, let's assume that TableA has 1 million rows again, TableB is a "junction" or "bridge" table and has like 500,000 rows and TableC has 1,000,000 rows. So assume the query is just an outer join to get all columns in TableA and TableC where a relationship exists like below...

SELECT A.*
     , C.*
FROM TableC C
 FULL OUTER JOIN TableB B ON C.ID = B.TableAID
 FULL OUTER JOIN TableA A ON A.ID = B.TableCID

Ok so given the last query, can anyone tell me what the "base relation" is? I don't think there is one, but was hoping for another database person's opinion.

uh_big_mike_boi
  • 3,350
  • 4
  • 33
  • 64
  • I would be hesitant to accept any answers, since, in my experience, there is no such creature as an `unbiased database enthusiast`. – Eric Brandt Dec 08 '18 at 04:46
  • 1
    Hi. You give a vague pseudo-definition for a term (that already has a technical meaning) then ask us what you mean by it. What use is that? What is your justification for "I don't think there is one"? What use would it be? This is arguably off-topic as all of unclear, too broad & opinion-based. Can you edit to a more concrete & useful question? It seems you are trying to ask something like, is there a common or imaginable clear & useful notion of "a table [or tables?] that in some sense drive a query"? (There is a reasonable concrete technical argument for a broad "no", so I answered.) – philipxy Dec 08 '18 at 05:58
  • All you say re "base" is that from "biggest", "driving" (whatever that means), "joined from" (whatever that means) & "columns visually on the left" a certain table is "*clearly*" "the base". On the contrary, it is not clear at all what "base" is supposed to mean. – philipxy Dec 15 '18 at 23:21

4 Answers4

4

The term "base table" has a definition and it has nothing to do with what you describe. A "base table" is pretty much just a "table". That is, it is not a view, it is not a table valued function, it is not the result of a query. It is what gets stored in the database as an explicit table.

What you are seem to be grasping for seems more related to optimization strategies. I have used similar terminology -- in the context of optimization -- to describe the "driving table" being accessed by the optimizer. The purpose of this is to distinguish between different execution plans.

Consider the query:

from t1 join t2 using (col)

There are multiple different execution plans. Here are some methods and what might be considered the "driving table" (if any) for them:

for each row in t1
    for each row in t2
         compare col
-->  t1 is the "driving table"

for each row in t2
    for each row in t1
        compare col
--> t2 is the "driving table"

for each row in t1
    look up t2 value using index on t2(col)
--> t1 is the "driving table"

sort t1 by col
sort t2 by col
compare the rows in the two sorted sets
--> no "driving table"

hash t1 by col
hash t2 by col
compare the hash maps
--> no "driving table"

In other words, the "driving" table has little to do with the query structure. It is based on the optimization strategies used for the query. That said, left joins and right joins limit the optimization paths. So, in a nested loop or index-lookup situation, the "first" (or "last") table would be the driving table.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I voted your answer up one but I don’t think you really provided any clarification. Which actually goes to my point, that the base table as my colleague put it is only a vague notion at best (or in this case the “driving table”). Actually query optimization in a mature relational engine, as I’m sure you know, is so sophisticated that correctly choosing the “driving table” would be wayyyy beyond the scope of a SO question. I have gotten rid of multiple 1,000+ page books just on query optimization, they are huge. – uh_big_mike_boi Dec 08 '18 at 19:11
3

A base table is a given named table-valued variable--a database table. That's it. In a query expression its name is a leaf expression denoting its value. "Given table variable" would be more descriptive. A query can use literal notation for a table. It would be reasonable for a given named table-valued constant to also be called "base". It's nothing about some kind of "main" table.


The relational model is founded on a table holding the rows that make a true proposition (statement) from its (characteristic) predicate (statement template parameterized by column names). We give base table rows & get query expression rows.

A query expression that is a base table name comes with a predicate given by the designer.

/* (person, liked) rows where [liker] likes [liked] */
/* (person, liked) rows where Likes(liker, liked) */
SELECT * FROM Likes

A query expression that is a table literal has a certain predicate in terms of columns being equal to values.

/* (person) rows where
    person = 'Bob'
*/
SELECT * FROM (VALUES ('Bob')) dummy (person)

Otherwise a query expression has a predicate built from its constituent table expression predicates according to its relation operator.

  • Every algebra operator corresponds to a certain logic operator.
    NATURAL JOIN & AND
    RESTRICTtheta & ANDtheta
    UNION & OR
    MINUS & AND NOT
    PROJECTall butC & EXISTS C
    etc

/* (person) rows where
    (FOR SOME liked, Likes(person, liked))
OR  person = 'Bob'
*/
    SELECT liker AS person
    FROM Likes
UNION
    VALUES ('Bob')

/* (person, liked) rows where
FOR SOME [values for] l1.*, l2.*,
        person = l1.liker AND liked = l2.liked
    AND Likes(l1.liker, l1.liked)
    AND Likes(l2.liker, l2.liked)
    AND l1.liked = l2.liker
    AND person = 'Bob'
    AND NOT Likes(l1.liked, 'Ed')
*/
Likes l1 INNER JOIN Likes l2
ON l1.liked = l2.liker
WHERE l1.liker = 'Bob'
AND NOT (l1.liked, 'Ed') IN (SELECT * FROM Likes)

There's no difference to how a base, literal or operator-call query expression is used in determining a containing query expression's predicate.

Is there any rule of thumb to construct SQL query from a human-readable description?
Relational algebra - recode column values

philipxy
  • 14,867
  • 6
  • 39
  • 83
3

The concept of a "driving" table is really an assumption about how the DBMS is expected to execute a query internally. A rule-based query optimizer, in the absence of any index-related preferences, may treat the ordering of tables and joins in a query as significant when it comes to choosing the execution plan. Under a cost-based optimizer, there is no significance to the order of tables and joins so nothing about the structure of the query itself will tell you which table gets read first or in what order the join conditions get evaluated.

When conceptualizing a query it may help to have a mental image of one table being the starting point for the query but I think the answer to the question here must be no. Logically speaking there is no such thing as a driving table.

nvogel
  • 24,981
  • 1
  • 44
  • 82
2

Let me suggest a perspective where the base table is the first one in the FROM clause (ie not a JOINed table). In the case where a statement can be equally written with either one table or another as base table, we would say that there are two (or more) base tables.

In your first query, the base table is TableA. If you invert TableA and TableC in the query, you are not guaranteed to get the same results, because of the LEFT JOIN.

In the second query, as you are using FULL JOINs, all 3 tables could be inverted without changing the result, so this is indeed a use case of a query where all tables are base tables.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Ok so in the second query, you can easily switch where TableA and TableC are in the query. So in that case then TableC should not be the base table, according to the logic you just gave. There should be two base tables, TableA and TableC. In fact, If we added many more tables, TableD, TableE, and they all had columns in TableB, then they would all be candidates to be the base table. – uh_big_mike_boi Dec 08 '18 at 01:30
  • Also, your answer is even more flawed because in my first example, I could have put TableB as the first column and RIGHT JOIN to TableA, and TableB would then be the base table according to your logic. – uh_big_mike_boi Dec 08 '18 at 01:36
  • @big_mike_boiii, you to use `INNER JOIN` if you want to be able to swith TableA and TableC... Updated my answer with more details – GMB Dec 08 '18 at 01:41
  • The second query uses FULL OUTER JOINs. I think you are getting LEFT JOIN from first query. – uh_big_mike_boi Dec 08 '18 at 01:47
  • @big_mike_boiii yes you are right, sorry ! Updated the answer – GMB Dec 08 '18 at 01:57
  • I did upvote your question for the good point that multiple tables can be base table candidates. I can't give you the correct answer because being "the table right after the FROM clause" is only true when LEFT JOIN is used, which is only a semantic convention. If RIGHT JOIN is used, then the base table, by what I think the "spirit" of your definition really is, can be anywhere. By definition, something semantic like that should NOT be part of any rule because it is subjective. – uh_big_mike_boi Dec 08 '18 at 02:03
  • 1
    allright @big_mike_boiii I see your point. I upvoted your question, let's see if other advices pop-up ! – GMB Dec 08 '18 at 02:06
  • 1
    This answer is ridiculous. What if the `JOIN` were a `RIGHT JOIN`? – Gordon Linoff Dec 08 '18 at 13:02