0

I'm using MySQL tables.

Sample table EMP (primary_key on (EID, DOJ)) :

ENAME EID DEPT SAL DOJ (YYYY-MM-DD)
A 6 ee 2000 2021-03-01
B 5 me 2020 2021-04-30
C 3 it 2000 2020-12-27
D 4 cv 2020 2020-10-31
E 1 it 2000 2021-01-01
F 2 it null 2021-02-28
G 7 ee null 2020-11-20

I have one job that deletes and inserts data into a similarly structured table.

The process needs to delete non-monthly data. That is, it will delete rows with EID 6, 3, 1, 7: These rows have DOJ that is not month-end.

The query I came up with:

-- DOJ is of type DATE
DELETE FROM EMP WHERE LAST_DAY(DOJ) <> DOJ ;

It works as expected. But, because there is large data (~5 million) this is slow.

I understand that because of LAST_DAY() function, I'm loosing the index advantage of DOJ. Can you please suggest how can I improve the query?

adarsh
  • 1,393
  • 1
  • 8
  • 16

2 Answers2

2

You are right about the index. It is no use here. The only idea that comes to mind is a generated column to tell you about the date being the month's end or not. Thus you'd have a column you can index and use in your query:

create table emp 
(
  ename varchar(100),
  ...
  doj date,
  is_month_end bool as (doj = last_day(doj))
);

create index idx_month_ends on emp (is_month_end);

delete from emp where not is_month_end;

Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=397388b70bb1f459bbefce630ad27ac4

An index can only help, though, if this is about a very small part of the data in the table, say 1%. With many more rows to delete it makes more sense to read the whole table.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • MySQL 8.0.13 and later even allow creating a functional index without adding the generated column. It's described in [Functional Key Parts](https://dev.mysql.com/doc/refman/8.0/en/create-index.html). – Bill Karwin Feb 12 '21 at 15:20
  • @Bill Karwin: Ah, finally. I didn't know that. Thanks for the information and the link. – Thorsten Kettner Feb 12 '21 at 15:22
  • @ThorstenKettner Because it's a long running prod table with lots of downstream consumers, cannot add a new column (BTW, I'm still on MySQL 5.0). For now, have gone ahead with the query in question. Accepting the answer as it should be the way forward. – adarsh Feb 13 '21 at 04:37
-1
Where (month(doj) in (9,4,6,11) and day(doj) <> 30) or
      (month(doj) in (1,3,5,7,8,10,12) and day(doj) <> 31) or
      (month(doj) in (2) and day(doj) not in(28,29))

? You can tweak feb a bit

P.Salmon
  • 17,104
  • 2
  • 12
  • 19
  • 1
    I'm not very good at performance aspects of databases. Can you please explain how this is better than the query in question? I mean, aren't we ignoring the fact that `DOJ` is indexed ? – adarsh Feb 12 '21 at 15:03
  • doj ISNT indexed eid,doj is..and I don't know that this approach is quicker(and can't test it) than last_day..It's just a suggestion try, running an explain on this and your query – P.Salmon Feb 12 '21 at 15:07
  • There is no advantage over the original where clause. (And it's even less readable and less reliable when it comes to leap years.). The index can't be used here either and the whole table must hence be read. Sorry, I am downvoting, because this suggestion is not a good idea. – Thorsten Kettner Feb 12 '21 at 15:17