6

I have the following table with roughly 400 million rows. I have to perform SELECT queries on this table, generally involving WHERE clause on Date_createcolumn.

The key is that most of my queries are on full days (not taking into account hours and seconds)

+----------------+--------------+------+-----+---------+-------+
| Field          | Type         | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| ID_num         | bigint(45)   | NO   | PRI | NULL    |       |
| Content        | varchar(250) | YES  |     | NULL    |       |
| User_ID        | bigint(24)   | NO   | MUL | NULL    |       |
| Location       | varchar(70)  | YES  |     | NULL    |       |
| Date_create    | datetime     | NO   |     | NULL    |       |
+----------------+--------------+------+-----+---------+-------+

Would there be a significant increase in performance in:

  1. Creating a new column Day_create which would be of DATE type, (not DATETIME) and having an index on this column.

As opposed to:

  1. Having an index on the Date_create column of type DATETIME .
ylnor
  • 4,531
  • 2
  • 22
  • 39

2 Answers2

1

You could have an index on Date_create that works well, as long as you take the advice given here and query the field optimally (not using the DATE() function).

This way, you get the best of both worlds: fast SELECT queries and no extra storage required for an extra column.

Zamfi
  • 327
  • 2
  • 9
1

No, don't bother with an extra column.

But do be careful how you formulate queries. Otherwise, the query may fail to use the index.

WHERE Date_create >= '2017-02-01'
  AND Date_create  < '2017-02-01' + INTERVAL 1 MONTH

is a simple, clear, leap-year-proof, datatype-proof, way to fetch all rows for this February. (Variations on it work fine for 1 DAY, 36 HOUR, 2 WEEK, etc.)

You have not said what the rest of the WHERE clause looks like. This is also important, especially for 400M rows. If, for example, you want the yesterday's data for location 'abcd':

WHERE Location = 'abcd'
  AND Date_create >= CURDATE() - INTERVAL 1 DAY
  AND Date_create  < CURDATE()

INDEX(Location, Date_create) -- in this order!

'2017-02-01' and '2017-02-01 00:00:00' are identical values; I prefer the former since it is shorter. This applies whether you are comparing against DATETIME or DATE.

If you need to discuss further, please provide the query/queries you need to discuss. (Other indexes may be needed; single-day and day-range may work differently, etc.)

Rick James
  • 135,179
  • 13
  • 127
  • 222