1

I know that this title is overused, but it seems that my kind of question is not answered yet. So, the problem is like this:

I have a table structure made of four tables (tables, rows, cols, values) that I use to recreate the behavior of the information_schema (in a way). In php I am generating queries to retrieve the data, and the result would still look like a normal table:

SELECT 
  (SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
  (SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2" 
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1') 
HAVING (col2 LIKE "%4%")

OR

SELECT * FROM 
(SELECT 
  (SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
  (SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2" 
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')) d 
WHERE col2 LIKE "%4%" 

note that the part where I define the columns of the result is generated by a php script. It is less important why I am doing this, but I want to extend this algorithm that generates the queries for a broader use. And we got to the core problem, I have to decide if I will generate a where or a having part for the query, and I know when to use them both, the problem is my algorithm doesn't and I have to make a few extra checks for this. But the two above queries are equivalent, I can always put any query in a sub-query, give it an alias, and use where on the new derived table. But I wonder if I will have problems with the performance or not, or if this will turn back on me in an unexpected way.

I know how they both work, and how where is supposed to be faster, but this is why I came here to ask. Hopefully I made myself understood, please excuse my english and the long useless turns of phrases, and all.

EDIT 1

I already know the difference between the two, and all that implies, my only dilemma is that using custom columns from other tables, with variable numbers and size, and trying to achieve the same result as using a normally created table implies that I must use HAVING for filtering the derived tables columns, at the same time having the option to wrap it up in a subquery and use where normally, this probably will create a temporary table that will be filtered afterwards. Will this affect performance for a large database? And unfortunately I cannot test this right now, as I do not afford to fill the database with over 1 billion entries (that will be something like this: 1 billion in rows table, 5 billions in values table, as every row have 5 columns, 5 rows in cols table and 1 row in tables table = 6,000,006 entries in total)

right now my database looks like this:

+----+--------+-----------+------+
| id | name   | title     | dets |
+----+--------+-----------+------+
|  1 | table1 | Table One |      |
+----+--------+-----------+------+

+----+-------+------+
| id | table | name |
+----+-------+------+
|  3 |     1 | col1 |
|  4 |     1 | col2 |
+----+-------+------+
where `table` is a foreign key from table `tables` 


+----+-------+-------+
| id | table | extra |
+----+-------+-------+
|  1 |     1 |       |
|  2 |     1 |       |
+----+-------+-------+
where `table` is a foreign key from table `tables` 

+----+-----+-----+----------+
| id | row | col | value    |
+----+-----+-----+----------+
|  1 |   1 |   3 | 13       |
|  2 |   1 |   4 | 14       |
|  6 |   2 |   4 | 24       |
|  9 |   2 |   3 | asdfghjk |
+----+-----+-----+----------+
where `row` is a foreign key from table `rows` 
where `col` is a foreign key from table `cols` 

EDIT 2

The conditions are there just for demonstration purposes!

EDIT 3

For only two rows, it seems there is a difference between the two, the one using having is 0,0008 and the one using where is 0.0014-0.0019. I wonder if this will affect performance for large numbers of rows and columns

EDIT 4

The result of the two queries is identical, and that is:

+----------+------+
| col1     | col2 |
+----------+------+
| 13       | 14   |
| asdfghjk | 24   |
+----------+------+
khael
  • 2,600
  • 1
  • 15
  • 36

4 Answers4

4

HAVING is specifically for GROUP BY, WHERE is to provide conditional parameters. See also WHERE vs HAVING

Community
  • 1
  • 1
Thom Wiggers
  • 6,938
  • 1
  • 39
  • 65
  • yes, I read it already, I had a different question, please read the edit. – khael Jul 30 '13 at 14:30
  • The linked question goes into that in depth: HAVING is applied after WHERE and HAVING is able to operate on aggregated results. – Thom Wiggers Jul 30 '13 at 17:05
  • Well, that sums up pretty much the answer to my question, I guess I will have to make some pre-computations to determine how the system should build the query. Thank you. – khael Aug 02 '13 at 07:40
1

I believe the having clause would be faster in this case, as you're defining specific values, as opposed to reading through the values and looking for a match.

user2366842
  • 1,231
  • 14
  • 23
  • But when I will generate the whole query I want to be able to put specific filter conditions based on my aliases as if I would work with a "real" table, not a derived one, and this is why I wrap it all into a sub-query. Now, for only two rows, it seems there is a difference between the two, the one using `having` is 0,0008 and the one using `where` is 0.0014-0.0019. I wonder if this will affect performance for large numbers of rows and columns. – khael Jul 30 '13 at 14:11
1

See: http://database-programmer.blogspot.com/2008/04/group-by-having-sum-avg-and-count.html

Basically, WHERE filters out columns before passing them to an aggregate function, but HAVING filters the aggregate function's results.

Colonel Thirty Two
  • 23,953
  • 8
  • 45
  • 85
  • yes, but there will be created a derived table, that will be filtered after by my where condition in the wrapped version, and the same result thing will with `having` also because, after all the rows will be obtained a filter will be applied. I am asking why is there a performance difference, and if it will increase. – khael Jul 30 '13 at 14:29
0

you could do it like that

    WHERE col2 In (14,24)

your code WHERE col2 LIKE "%4%" is bad idea so what about col2 = 34 it will be also selected.

echo_Me
  • 37,078
  • 5
  • 58
  • 78
  • that is a dummy condition, it is there just as a demonstration. – khael Jul 30 '13 at 14:27
  • @khael and how much it takes with my condition ? – echo_Me Jul 30 '13 at 14:33
  • Unfortunately I think you missed the point a bit, you see I am more interested in the wrapping of the sub-query to use `where` -VS- using `having`. It was a dummy condition anyway. I mean it was only there for the test, in practice it will not be used. – khael Jul 30 '13 at 14:37