1

Lets say, I have a table:

+------------+-----------+------+-----+-------------------+-----------------------------+
| Field      | Type      | Null | Key | Default           | Extra                       |
+------------+-----------+------+-----+-------------------+-----------------------------+
| id         | int(10)   | NO   | PRI |                   | AUTOINCREMENT               |
| id_action  | int(10)   | NO   | IDX |                   |                             |
| a_date     | date      | NO   | IDX |                   |                             |
| a_datetime | datetime  | NO   | IDX |                   |                             |
+------------+-----------+------+-----+-------------------+-----------------------------+

Each row has some id_action, and the a_date and a_datetime when it was executed on the website.

My question is, when I want to return COUNT() of each id_action grouped by a_date, is it same, when I use this two selects, or they are different in speed? Thanks for any explanation.

SELECT COUNT(id_action), id_action, a_date
FROM my_table
GROUP BY a_date
ORDER BY a_date DESC

and

SELECT COUNT(id_action), id_action, DATE_FORMAT(a_datetime, '%Y-%m-%d') AS `a_date`
FROM my_table
GROUP BY DATE_FORMAT(a_datetime, '%Y-%m-%d')
ORDER BY a_date DESC

In other words, my question is, that each action has its datetime, and if I really need column a_date, or it is the same using DATE_FORMAT function and column a_datetime and I dont need column a_date?

Dharman
  • 30,962
  • 25
  • 85
  • 135
Legionar
  • 7,472
  • 2
  • 41
  • 70
  • DATE_FORMAT for every row is slower than `LEFT(a_datetime, 10)`, as ilan said, drop the a_date column, you could always use LEFT or DATE_FORMAT – Michael Freund Dec 08 '14 at 12:09
  • 2
    Unless grouping by the PK column(s), you must GROUP BY ALL the non-aggregated columns within the SELECT – Strawberry Dec 08 '14 at 12:14
  • 1
    But when I drop `a_date`, will `LEFT(a_datetime, 10)` use index on column `a_datetime`? That table will have milions of rows. – Legionar Dec 08 '14 at 12:48
  • 1
    @Legionar, if you group by `function(column)`, then index on `column` can not be used => performance suffers. On the other hand, if you are going to count (i.e. read) all rows anyway, having an index may be irrelevant. You have to try it yourself on your system and measure the difference. Index on `a_date` column may be useful in other reporting queries. – Vladimir Baranov Apr 08 '15 at 23:57

6 Answers6

2

I ran both the queries on similar table on MySQL 5.5.

The table has 10634079 rows.

First one took 10.66 initially and always takes approx 10 secs on further attempts.

Seconds Query takes 1.25 mins to execute first time, on second, 3rd.... attempts its taking 22.091 secs

So in my view, if your are looking for performance, then you must have column a_date, as its taking half of the time when executed without Date_Format.

If performance is not the primay concern (like data redundancy can be) then a_datetime column will serve all other date/datetime related purposes.

Anil
  • 3,722
  • 2
  • 24
  • 49
1

DATE : The DATE type is used for values with a date part but no time part.

DATETIME: The DATETIME type is used for values that contain both date and time parts.

so if you have DATETIME you can always derive DATE from it but from DATE you can not get DATETIME.

And as per your sql there will not be a major difference.

It will be better not to have a_date because you already have a_datetime.

but in general if you can use TIMESTAMP you should, because it is more space-efficient than DATETIME.

Suchit kumar
  • 11,809
  • 3
  • 22
  • 44
  • I understand difference between `DATE` and `DATETIME`, my question is, if column `a_datetime` will be enough, if I need daily report, so if index on this column will be fast or not, when there will be milions of rows... – Legionar Dec 08 '14 at 12:57
  • it will be comparatively slower than Date because of storage.better use timestamp if possible. – Suchit kumar Dec 08 '14 at 13:01
1

Using a_date to group by day will be more efficient than a_datetime because of your conversion. In T-SQL I use a combination of DATEADD() and DATEDIFF() to get the date only from DATETIME since math is more efficient than data conversion. For example (again, using T-SQL though I'm sure there's something similar for MySQL):

SELECT COUNT(id_action), id_action,
    DATEADD(DD,DATEDIFF(DD,0,a_datetime),0) AS [a_date]
FROM my_table
GROUP BY DATEADD(DD,DATEDIFF(DD,0,a_datetime),0) AS [a_date]
ORDER BY a_date DESC

This will find the number of days between day 0 and a_datetime then add that number of days to day 0 again. (Day 0 is just an arbitrary date chosen for it's simplicity.)

Perhaps the MySQL version of that would be:

DATE_ADD('2014-01-01', INTERVAL DATEDIFF('2014-01-01',a_datetime) DAY)

Sorry I don't have MySQL installed or I would try that myself. I'd expect it to be more efficient than casting/formatting but less efficient than using a_date.

BAReese
  • 491
  • 2
  • 5
0

If you are doing a function in your group by clause: "GROUP BY DATE_FORMAT(a_datetime, '%Y-%m-%d')", you will not be leveraging your index: "a_datetime".

As for speed, I believe there will be no noticeable difference between indexing on datetime vs date (but it's always easy to test with 'explain')

Lastly, you can always read a datetime as a date (using cast functions if need be). Your schema is not normalized if you have both a a_date and a_datetime. You should consider removing one of them. If date provides enough granularity for your application, then get rid of datetime. Otherwise, get rid of a_date and cast as required

ilan berci
  • 3,883
  • 1
  • 16
  • 21
0

As already mentioned, the performance of any function(o_datetime) will be worse than just a_date. The choice depends on on your needs, if there is no need to DATETIME, take a DATE and that is.

If you still need to find a function to convert, then I advise you to take a date().

See also How to cast DATETIME as a DATE in mysql?

Community
  • 1
  • 1
artoodetoo
  • 918
  • 10
  • 55