1

I'm trying to optimize the speed of this query:

      SELECT t.t_date td, v.visit_date vd
      FROM temp_dates t
      LEFT JOIN visits v ON DATE_FORMAT(v.visit_date, '%Y-%m-%d') = t.t_date
      ORDER BY t.t_date

v.visit_date is of type DATETIME and t.t_date is a string of format '%Y-%m-%d'. Simply creating an index on v.visitdate didn't improve the speed. Therefore I intended to try the solution @oysteing gave here: How to optimize mysql group by with DATE_FORMAT I successfully created a virtual column by this SQL ALTER TABLE visits ADD COLUMN datestr varchar(10) AS (DATE_FORMAT(visit_date, '%Y-%m-%d')) VIRTUAL; However when I try to create an index on this column by CREATE INDEX idx_visit_date on visits(datestr) I get this error:

#1901 - Function or expression 'date_format()' cannot be used in the GENERATED ALWAYS AS clause of datestr

What am I doing wrong? My DB is Maria DB 10.4.8

Best regards - Ulrich

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Sempervivum
  • 928
  • 1
  • 9
  • 21

4 Answers4

3

date_format() cannot be used for persistent generated columns either. And in an index it cannot be just virtual, it has to be persisted.

I could not find an explicit statement in the manual, but I believe this is due to the fact that the output of date_format() can depend on the locale and isn't strictly deterministic therefore.

Instead of date_format() you can build the string using deterministic functions such as concat(), year(), month(), day() and lpad().

...
datestr varchar(10) AS (concat(year(visit_date),
                               '-',
                               lpad(month(visit_date), 2, '0'),
                               '-',
                               lpad(day(visit_date), 2, '0')))
...

But as I already mentioned in a comment, you're fixing the wrong end. Dates/times should never be stored as strings. So you should rather promote temp_dates.t_date to a date and use date() to extract the date portion of visit_date in the generated, indexed column

...
visit_date_date date AS (date(visit_date))
...

And you might also want to try to also index temp_dates.t_date.

sticky bit
  • 36,626
  • 12
  • 31
  • 42
  • Many thanks, switching to DATETIME and using DATE improved speed a lot, by more than a factor of 10. However additional questions arose. I'm gonna update my question later. – Sempervivum Feb 10 '21 at 22:14
  • @Sempervivum: Glad to hear that. But for additional questions consider asking a new one so that this one can stay focused. – sticky bit Feb 10 '21 at 22:17
  • OK, I'm gonna do so and mark your answer as solution here. – Sempervivum Feb 10 '21 at 22:21
  • 1
    Note, the 3 args form of [date_format](https://mariadb.com/kb/en/date_format/) is deterministic. – danblack Feb 11 '21 at 05:17
1

Does this work for you?

SELECT t.t_date td, v.visit_date vd
  FROM temp_dates t
  LEFT JOIN visits v ON DATE(v.visit_date) = DATE(t.t_date)
 ORDER BY t.t_date

If so, there's a workable solution to your problem:

  1. Add a DATE column using the deterministic DATE() function on your visit_date object. Like this.

    ALTER TABLE visits ADD COLUMN dateval DATE AS (DATE(visit_date)) VIRTUAL; 
    CREATE INDEX idx_visit_date on visits(dateval);
    
  2. Then create a virtual column in the other table (the one with the nicely formatted dates jammed into your VARCHAR() column.

    ALTER TABLE temp_dates ADD COLUMN dateval DATE AS (DATE(t_date)) VIRTUAL;
    CREATE INDEX idx_temp_dates_date on temp_dates (dateval);
    

This works because DATE() is deterministic, unlike DATE_FORMAT().

Then your query should be.

SELECT t.t_date td, v.visit_date vd
  FROM temp_dates t
  LEFT JOIN visits v ON v.dateval = t.dateval
 ORDER BY t.t_date

This solution gives you indexes on (virtual) DATE columns. That's nice because index matching on such columns is efficient.

But, your best solution is to change the datatype of temp_date.t_date from VARCHAR() to DATE.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
1

DATE_FORMAT(expr, format) cannot be used in virtual columns as it depends on the locale of the connection (MariaDB issue MDEV-11553).

A 3 argument form was created to date_format that adds locale.

DATE_FORMAT(visit_date, '%Y-%m-%d', 'en_US') is possible to use in virtual column expressions in MariaDB-10.3+ stable versions.

Using DATE or altering your query not to use functions around column expressions is definitely recommended.

danblack
  • 12,130
  • 2
  • 22
  • 41
0

Functions are not "sargeable".

Consider:

ON  v.visit_date >= t.t_date
AND v.visit_date  < t.t_date + INTERVAL 1 DAY
Rick James
  • 135,179
  • 13
  • 127
  • 222