Do NOT disable strict
or ONLY_FULL_GROUP_BY
!
You should not disable strict
or remove ONLY_FULL_GROUP_BY
. The problem is that your query is ambiguous, and that's something you need to fix. It might not make a difference to your output, but it might also cause huge problems. It's better to be sure.
A great explanation can be read on Percona (summed up below).
Let's look at the problem
Consider the following situation:
+----+--------------------+---------+---------------------+
| id | page_url | user_id | ts |
+----+--------------------+---------+---------------------+
| 1 | /index.html | 1 | 2019-04-17 12:21:32 |
| 2 | /index.html | 2 | 2019-04-17 12:21:35 |
| 3 | /news.php | 1 | 2019-04-17 12:22:11 |
| 4 | /store_offers.php | 3 | 2019-04-17 12:22:41 |
| 5 | /store_offers.html | 2 | 2019-04-17 12:23:04 |
| 6 | /faq.html | 1 | 2019-04-17 12:23:22 |
| 7 | /index.html | 3 | 2019-04-17 12:32:25 |
| 8 | /news.php | 2 | 2019-04-17 12:32:38 |
+----+--------------------+---------+---------------------+
Now we want to issue a query to calculate the most visited pages. This is probably what you're used to writing:
SELECT page_url, user_id, COUNT(*) AS visits
FROM web_log
GROUP BY page_url
ORDER BY COUNT(*) DESC;
But look at the results:
+-------------------+---------+--------+
| page_url | user_id | visits |
+-------------------+---------+--------+
| /index.html | 1 | 3 |
| /news.php | 1 | 2 |
| /store_offers.php | 3 | 2 |
| /faq.html | 1 | 1 |
+-------------------+---------+--------+
The query works, but it’s not really correct. It is easily understandable that page_url
is the column of the grouping function, the value we are most interested in and we want to be unique for counting. Also, the visits
column is good, as it’s the counter. But what about user_id
? What does this column represent?
We grouped on the page_url
so the value returned for user_id
is just one of the values in the group. In fact, it was not only user to visit the index.html, as users 2 and 3 visited the page. So what should we make of that value? Is it the first visitor? Is it the last one?
We don’t know the right answer! The user_id
column’s value is a random item of the group!
The Solution
You need to consider if you need the values not used in the groupBy()
. If not, then just use a select()
to explicitly name the column you need.
If you DO need a column not used in the groupBy()
, use an aggregate function (like SUM()
or GROUP_CONCAT()
or MAX()
) as part of a Laravel selectRaw
query. Then you can be sure that your query is giving you what you expect.
If you know it doesn't matter (like in this example), then there's a miscellaneous function called ANY_VALUE()
that makes it clear that any value will do.
So in the above example, you could do:
SELECT page_url, ANY_VALUE(user_id), COUNT(*) AS visits
FROM web_log
GROUP BY page_url
ORDER BY COUNT(*) DESC;
Or in Laravel:
WebLog::selectRaw('page_url', 'ANY_VALUE(user_id)', 'COUNT(*) AS visits')
->groupBy('page_url')
->orderBy('visits')
->get();
Now you are clearly stating that the value doesn't matter, rather than trusting you haven't made a mistake. And if it DOES matter (and you don't apply ANY_VALUE()
) then MySQL will alert you to the problem.
Here's a list of available aggregate functions.