0

Say I'm running MySQL 8 with a table data containing about 1M rows. And I want to filter a datetime column on date a range (using a date index).

CREATE TABLE `data` (
  `rowId` int NOT NULL AUTO_INCREMENT,
  `data` json NOT NULL,
  `created` DATETIME NOT NULL,                               -- <-- datetime column for a date index
  `created_date` DATE AS (cast(`created` as date)) NOT NULL, -- <-- generated date column
  PRIMARY KEY (`rowId`),
  INDEX (`created`),                                         -- <-- datetime index w/ cardinality ~ 1M 
  INDEX (`created_date`)                                     -- <-- date index w/ cardinality of 1250
  INDEX `created_cast` ((cast(`created` as date)))           -- <-- functional "date" index w/ cardinality of 1250
  --     ^ I WANT TO USE THIS INDEX, BUT DON'T KNOW HOW
) ENGINE=InnoDB;

Then let's filter rows only from 2018, let's say:

SELECT COUNT(*) FROM data
WHERE created >= CAST('2018-01-01' AS DATE) AND created < CAST('2019-01-01' AS DATE);
-- Query time: 0.16 s
-- EXPLAIN shows: key: created, Using where; Using index
-- Uses the whole datetime index w/ cardinality of ~ 1M 


SELECT COUNT(*) FROM data 
WHERE created_date BETWEEN CAST('2018-01-01' AS DATE) AND CAST('2018-12-31' AS DATE);
-- Query time: 0.09 s
-- EXPLAIN shows: key: created_date, Using where; Using index
-- Uses the date index w/ cardinality of 1250


SELECT COUNT(*) FROM data
WHERE CAST(created AS DATE) BETWEEN CAST('2018-01-01' AS DATE) AND CAST('2018-12-31' AS DATE);
-- Query time: 0.35 s, that's slow!
-- EXPLAIN shows: key: NULL, Using where
-- Doesn't use any index at all! 

Is there a way to use this functional index?

The queries above use either created (datetime) or created_date (date from generated column) indexes.

Is there a way to use functional index created_cast directly? I've even tried with USE INDEX or FORCE INDEX, but no luck.

Thank you all :)

ΔO 'delta zero'
  • 3,506
  • 1
  • 19
  • 31

1 Answers1

0

I would rewrite your queries and remove the CAST and use the he generated column, as you have no conversion in it, but it needs space.

CREATE TABLE data (created datetime,
`created_date` DATE AS (cast(`created` as date)) NOT NULL,  INDEX testin (`created`),                                         -- <-- datetime index w/ cardinality ~ 1M 
  INDEX (`created_date`))
INSERt INTO data (created) VALUEs('2018-02-01'),('2018-02-01'),('2018-02-01'),('2018-02-01')
    SELECT COUNT(*) FROM data
    WHERE YEAR(`created`) = 2018;
    
| COUNT(*) |
| -------: |
|        4 |
    SELECT COUNT(*) FROM data FORCE INDEX (testin)
    WHERE DATE(created) BETWEEN '2018-01-01' AND '2018-12-31';
| COUNT(*) |
| -------: |
|        4 |
    EXPLAIN SELECT COUNT(*) FROM data FORCE INDEX (testin) 
    WHERE DATE(created) BETWEEN '2018-01-01' AND '2018-12-31'
id | select_type | table | partitions | type  | possible_keys | key    | key_len | ref  | rows | filtered | Extra                   
-: | :---------- | :---- | :--------- | :---- | :------------ | :----- | :------ | :--- | ---: | -------: | :-----------------------
 1 | SIMPLE      | data  | null       | index | null          | testin | 6       | null |    4 |   100.00 | Using where; Using index
    SELECT COUNT(*) FROM data 
    WHERE created BETWEEN '2018-01-01 00:00:00' AND '2018-12-31 23:49:59';
| COUNT(*) |
| -------: |
|        4 |
EXPLAIN     SELECT COUNT(*) FROM data 
    WHERE created BETWEEN '2018-01-01 00:00:00' AND '2018-12-31 23:49:59';
id | select_type | table | partitions | type  | possible_keys | key    | key_len | ref  | rows | filtered | Extra                   
-: | :---------- | :---- | :--------- | :---- | :------------ | :----- | :------ | :--- | ---: | -------: | :-----------------------
 1 | SIMPLE      | data  | null       | index | testin        | testin | 6       | null |    4 |   100.00 | Using where; Using index

db<>fiddle here

nbk
  • 45,398
  • 8
  • 30
  • 47
  • Thank you for your answer. `WHERE YEAR(`created`) = 2018` seems very elegant, but it uses the datetime index and still runs 0.225 s slow over my 1M dataset. Second solution, `WHERE DATE(created) BETWEEN ...` uses no index and runs 0.653 s slow and `WHERE DATE(created_date) BETWEEN ...` uses date index (from a generated column), and runs at 0.150 s. Neither of those use that functional index. – ΔO 'delta zero' Dec 05 '20 at 23:58
  • see my changed answer and try again please – nbk Dec 06 '20 at 00:29
  • This uses the datetime index and runs at 0.424 s... I've tried with `FORCE INDEX (created_date)` which only works for `WHERE created_date ...`, and also `FORCE INDEX (created_cast)`, which doesn't seem to work at all. – ΔO 'delta zero' Dec 06 '20 at 00:57
  • I'm basically looking for a way to use that functional index (`INDEX `created_cast` ((cast(`created` as date)))`). Without the use of a generated column, which clutter the schema. – ΔO 'delta zero' Dec 06 '20 at 00:58
  • yes i understand, see my last approach, but doubt that will improve that much, because here als the where has priority – nbk Dec 06 '20 at 01:46
  • this works the same about my first query, still using the 1M index... But thank you for trying :-) – ΔO 'delta zero' Dec 07 '20 at 15:26