0

data

id   amount created_at
"1" "105"   "2018-01-28 15:22:00"
"2" "120"   "2018-01-28 15:23:00"
"3" "200"   "2018-01-28 15:24:00"
"4" "205"   "2018-01-28 15:24:10"
"5" "230"   "2018-01-28 15:25:00"

This is my table fields and data, my cronjob running every minute, but for some reason, it might run again at the same minute, therefore I need to filter the data group by the date to minute

What I tried

SELECT COUNT(*), SUM(`amount`), MAX(`amount`), MIN(`amount`)
FROM (
SELECT *
FROM `stats`
GROUP BY DATE_FORMAT(`created_at`, '%Y-%m-%d %H:%i')
) AS tmp
GROUP BY DATE_FORMAT(`created_at`, '%Y-%m-%d %H:%i')

The result is exactly what I need, but I heard subquery is always the last choice because of the efficiency issue, is there another way to do it?

"1" "105"   "105"   "105"
"1" "120"   "120"   "120"
"1" "200"   "200"   "200"
"1" "230"   "230"   "230"
Chan
  • 1,947
  • 6
  • 25
  • 37
  • 1
    Even though MySQL allows you to use GROUP BY and * you shouldn’t. The results are not stable and may change whenever. In any case you shouldn’t just return all columns if you don’t need them. What does the query plan show? Some engines know how to rewrite subqueries away so a blanket statement about them isn’t very helpful. Also why do you want min, max, sum if there’s only one value? Just a test? – Sami Kuhmonen Jan 28 '18 at 07:58
  • 1
    Your query does not filter out duplicates because your records are not duplicates of each other. The sum, max and min in your query does not make any sense. You should describe what exactly you would like to do. – Shadow Jan 28 '18 at 07:59
  • @Shadow id number 4 should not be exists, I only want to do the stats "per minutes", this query filtered rest data at the same minutes – Chan Jan 29 '18 at 03:15
  • @Chan I get that but in this case you need to define which record you would like to pick from the ones within the same minute. The first one (earlier)? The last one? The one with the least or most amount? – Shadow Jan 29 '18 at 06:29
  • @Shadow, In this case, I need the first one – Chan Jan 29 '18 at 08:36
  • @Chan the two duplicate questions show solutions to the greatest-n-per-group problem without using subqueries (there are solutions in them using subqueries as well). The 2nd one is about finding the maximum value as opposed to minimum, but the same logic applies, just you need to reverse the greater than operator. However, pls note that a solution with subquery sometimes outperforms the solution without one, so you better test both approaches. – Shadow Jan 29 '18 at 13:32

1 Answers1

0

Your expected output implies that you want one record per minute, and either the record with the smallest id or amount, for that minute. Assuming you want the smallest id, then you may try the following query:

SELECT s1.*
FROM stats s1
INNER JOIN
(
    SELECT
        DATE_FORMAT(created_at, '%Y-%m-%d %H:%i') AS created_at
        MIN(id) AS min_id
    FROM stats
    GROUP BY DATE_FORMAT(created_at, '%Y-%m-%d %H:%i')
) s2
    ON DATE_FORMAT(s1.created_at, '%Y-%m-%d %H:%i') = s2.created_at AND
       s1.amount = s2.min_id;
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Smallest id which means the first data at the same minutes – Chan Jan 29 '18 at 03:17
  • @Chan Check my updated answer. – Tim Biegeleisen Jan 29 '18 at 03:20
  • thanks for your help, but it still uses subquery isn't it? – Chan Jan 29 '18 at 05:08
  • I know of no way of answering your question without using a subquery. Even if we had `ROW_NUMBER` available we would still need to subquery. Why do you think that your current approach is bad? – Tim Biegeleisen Jan 29 '18 at 05:12
  • @TimBiegeleisen if the OP needs the record with min/max value then 1) the question is a duplicate 2) can be done without subquery. I'm just wsiting for a confirmation from the OP. – Shadow Jan 29 '18 at 06:31