0

My question is the following:

As asked in the question "How to count amount of rows referring to a particular row foreign key in MySql?", I want to count table references involving multiple tables referring to the table I'm interested about. However here we want the specific number of references per row for the resourced table.

In addition, what about the variant where the tables do reference eachother, but the foreign key does not exist?

Let's setup some minimal examples; We have three tables, here called A, B, and C. B and C refer rows in A. I want to count the total amount of references for each row in A.

Contents of the first table (A), and expected query results in the column 'Count':;

+----+------------+-------+
| ID |    Name    | Count |
+----+------------+-------+
|  1 | First row  |     0 |
|  2 | Second row |     5 |
|  3 | Third row  |     2 |
|  4 | Fourth row |     1 |
+----+------------+-------+

Contents of the second table (B):

+----+------+
| ID | A_ID |
+----+------+
|  1 |    2 |
|  2 |    2 |
|  3 |    2 |
+----+------+

Contents of the third table (C):

+----+------+
| ID | A_ID |
+----+------+
|  1 |    2 |
|  2 |    2 |
|  3 |    3 |
|  4 |    3 |
|  5 |    4 |
+----+------+

Important restrictions for a solution

  1. The solution should work with n tables, for reasonable values of n. The example has n=2.
  2. The solution should not involve a subset of the product set of all the tables. As some rows from A may be referenced a bunch of times in all the other tables the size of the product set may well be stupidly large (e.g. 10*10*10*... becomes big quickly). E.g. it may not be O(q^n) where n is the number of tables and q is the amount of occurrences.
aphid
  • 1,135
  • 7
  • 20
  • "it may not be `O(q^n)` where n is the number of tables and q is the amount of occurrences. " it's really hard to figure out a run time complexity for a SQL query because of the MySQL optimizer and possible changing query plans.. "it may not be `O(q^n)` where n is the number of tables" sounds like you not understanding how SQL works or your not thinking logical.. If you define multiple tables it's logical you need to name those tables within your SQL query.. – Raymond Nijland Apr 19 '18 at 13:35
  • What I mean by that is that when you do a plain JOIN, mySQL will multiply with each join. E.g. if table B has two rows matching A, and C has three, JOINing both will result in 6 rows. This happens before any counting. Thus if you join say 30 tables you can get 10^30 rows in your result if each had 10 rows for a particular row in table A. That's a problem, obviously. Thus the restriction; it means you can't aggregate first, then remove duplicates later, as it's too expensive. – aphid Apr 19 '18 at 13:54

1 Answers1

0

This is a partial solution, which I believe still suffers from performance problems related to condition [2]

I'm adding this as an answer as it may be useful for those working towards a better solution

Apply the following query. Extend as necessary with additional tables, adding additional lines to both the sum and the set of JOINs. This particular solution will work as long as you have less than about 90 tables. With more than that, you will have to run multiple queries like it and cache the results (for example by creating a column in the 'A' table), then sum all these later on.

SELECT  
    COUNT(DISTINCT B.ID) + 
    COUNT(DISTINCT C.ID) -- + .....
    AS `Count` 
FROM A
    LEFT JOIN B ON A.ID = B.A_ID
    LEFT JOIN C ON A.ID = C.A_ID

Unfortunately, if you have often referenced rows, the query will have a massive intermediate result, run out of memory, and thus never complete.

aphid
  • 1,135
  • 7
  • 20
  • Don't forget to add indexes when using JOIN's otherwise the performance will be really bad also and will also result in a much higher run time complexity – Raymond Nijland Apr 19 '18 at 13:33