5

Consider the following query:

SELECT * FROM Transactions
WHERE day(Stamp - interval 3 hour) = 1;

The Stamp column in the Transactions table is a TIMESTAMP and there is an index on it. How could I change this query so it avoids full table scans? (that is, using Stamp outside of the day() function)

Thanks!

Riedsio
  • 9,758
  • 1
  • 24
  • 33
emx
  • 1,295
  • 4
  • 17
  • 28
  • I am not asking for "function indexes" - they do not exist. Rather I would like to transform this query the same way you could transform "SELECT * FROM table WHERE sqrt(column) = 2" into "SELECT * FROM table WHERE column = 4" – emx Dec 02 '10 at 13:15

6 Answers6

8

This is how I would do it:

add some extra fields: YEAR, MONTH, DAY or even HOUR, MINUTE depending on the traffic you expect. Then build a trigger to populate the extra fields, maybe subtracting the 3 hour interval in advance. Finally build some index on the extra fields.

Massimog
  • 166
  • 10
  • 4
    MySQL does not support function indexes - Massimog's solution is the only alternative to specifying every possible date range - however since your query is going probably going to extract 1/30th of the rows, using an index lookup will not be significantly faster than using a full table scan. – symcbean Dec 02 '10 at 12:22
  • You might be right on the performance consideration - never thought about that really. More interested into transforming my query. – emx Dec 02 '10 at 13:19
  • Looks like I what I wanted isn't really possible and your suggestion looks the best from a performance point of view. – emx Dec 02 '10 at 14:11
1

If the goal is just to avoid full table scans and you have a PRIMARY KEY (say named PK) for Transactions, consider adding covering index

ALTER TABLE Transactions ADD INDEX cover_1 (PK, Stamp)

Then

SELECT * FROM Transactions WHERE PK IN (SELECT PK FROM Transactions
WHERE day(Stamp - interval 3 hour) = 1
 )

This query should not use full table scans (however optimizer may decide to use full scan, if number of rows in table is small or for whatever other statistical reason :) )

Better way may be is to use temporary table instead of subquery.

petr
  • 511
  • 4
  • 7
1

You can often rewrite the function so you have something that looks like WHERE Stamp=XXXX and XXXX is some expression. You could create a series of BETWEEN statements for each month, WHERE Stamp BETWEEN timestamp('2010-01-01 00:00:00') AND timestamp ('2010-01-01 23:59:59') OR Stamp BETWEEN ..., but I'm not certain this would use the index in this case. I'd build a column that was the day of the month as @petr suggests.

Joshua Martell
  • 7,074
  • 2
  • 30
  • 37
0

Calculate your desired Stamp value separately before you run your main query, i.e.

Step 1 - calculate the desired Stamp value

Step 2 - run a query where Stamp > (calculated value)

Because there's no calculation in step 2, you should be able to use your index.

Brian
  • 6,391
  • 3
  • 33
  • 49
0

If I understand it correctly, you basically want to return all rows where the stamp falls on the first in each month (having subtracted the 3 hours)? If (and this is a big if), you have a fixed window of, say the latest 6 months, you could just enumerate 6 range tests. But still, I'm not sure indexed access will be faster anyways.

select *
  from transactions
 where stamp between timestamp '2010-06-01 03:00:00' and timestamp '2010-06-02 02:59:59'
    or stamp between timestamp '2010-07-01 03:00:00' and timestamp '2010-07-02 02:59:59'
    or stamp between timestamp '2010-08-01 03:00:00' and timestamp '2010-08-02 02:59:59'
    or stamp between timestamp '2010-09-01 03:00:00' and timestamp '2010-09-02 02:59:59'
    or stamp between timestamp '2010-10-01 03:00:00' and timestamp '2010-10-02 02:59:59'
    or stamp between timestamp '2010-11-01 03:00:00' and timestamp '2010-11-02 02:59:59'
    or stamp between timestamp '2010-12-01 03:00:00' and timestamp '2010-12-02 02:59:59';

NB! I'm not sure how the millisecond part of the timestamp works. You may need to pad it accordingly.

Ronnis
  • 12,593
  • 2
  • 32
  • 52
0

Reworking petr's answer a bit to avoid the IN clause, and to make it for MyISAM or InnoDB.

For MyISAM

ALTER TABLE Transactions ADD INDEX cover_1 (PK, Stamp)

Or, for InnoDB, where the PK is implicitly included in every index,

ALTER TABLE Transactions ADD INDEX Stamp (Stamp)

Then

SELECT * 
FROM Transactions LEFT JOIN
  (
  SELECT PK 
  FROM Transactions 
  WHERE DAYOFMONTH(Stamp - interval 3 hour) = 1
  ) a ON Transactions.PK=a.PK

The subquery will have an index only execution, and the outer query will only pull the rows from the table where a.PK came through.

Riedsio
  • 9,758
  • 1
  • 24
  • 33