1

I have to extract quite big chunk of data from database (MYSQL MariaDB) and hereby I decided to take it by small chunks (limit = 1000 rows ). In total I need to extract ~9228 rows (it sounds not much but fetch time reach 100 - 120 seconds when I try to get all 9228 lines at once with one query).

When I fetch first 8 data chunks (1000 rows each), everything is good ~0.4 seconds per query. But when I try to extract the last 228 lines everything goes really slow - 80 seconds if I use LIMIT 1000 OFFSET 9000 or 50 seconds when I use the exact number of rows for LIMIT LIMIT 228 OFFSET 9000. But the query which is used to get total number of lines takes 30 seconds, so the two queries in total 80 seconds again.

My sql query to get data looks the following:

SELECT events.eventid, functions.triggerid FROM events
INNER JOIN functions ON  events.objectid = functions.triggerid
WHERE
events.name LIKE 'DISCONNECT MSK-AP%'
OR events.name LIKE 'AP MSK-AP%'    # '%MSK-AP%'  is much slower than OR
AND events.value = 1
AND events.clock >= '1588280400' 
AND events.clock <= '1590958799'
GROUP BY events.eventid
ORDER BY events.eventid DESC
LIMIT 1000 OFFSET 0;  # SO OFFSET COULD BE 0, 1000, 2000,  ... 8000, 9000

My sql query to get total number of lines (it is slow 30 seconds!) is as follows:

SELECT COUNT(distinct(events.eventid)) FROM events
INNER JOIN functions ON  events.objectid = functions.triggerid
WHERE
events.name LIKE 'DISCONNECT MSK-AP%'
OR events.name LIKE 'AP MSK-AP%'
AND events.value = 1
AND events.clock >= '1588280400' 
AND events.clock <= '1590958799';

My Database version:

protocol_version 10 slave_type_conversions
version 5.5.60-MariaDB version_comment MariaDB Server version_compile_machine x86_64 version_compile_os Linux

Why the last query to get the last chunk is so slow in comparison with other and what can I do to solve the issue? Can temporary database table help in the case?

Why I am not sure that the answer to question fits my case: Why does MYSQL higher LIMIT offset slow the query down?

Because the problems does not correlate with OFFSET SIZE, e.g. :

LIMIT 100 OFFSET 9100; - 0.25 seconds BUT LIMIT 100 OFFSET 9200; - 114 seconds!

So the problem appears when offset + limit is close or larger than total lines number (9228) !

Artiom Kozyrev
  • 3,526
  • 2
  • 13
  • 31
  • Relevant https://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down. – snakecharmerb May 07 '20 at 16:07
  • Does this answer your question? [Why does MYSQL higher LIMIT offset slow the query down?](https://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down) – Dave May 07 '20 at 16:11
  • @snakecharmerb I have seen the answer, I have doubts that It could be a great speed difference between offset 8000 and offset 9000 (200 times bigger), I also tried to do offset 9000 and limit 100 and everything was really fast (about 0.7 seconds) but when I do offset 9200 and limit 28 everything is really slow. So I do not think that the answer perfectly fit the situation. – Artiom Kozyrev May 07 '20 at 16:18
  • @Dave I have seen the answer, I have doubts that It could be a great speed difference between offset 8000 and offset 9000 (200 times bigger), I also tried to do offset 9000 and limit 100 and everything was really fast (about 0.7 seconds) but when I do offset 9200 and limit 28 everything is really slow. So I do not think that the answer perfectly fit the situation. – Artiom Kozyrev May 07 '20 at 16:18
  • I thought it was relevant, not that it was the solution :) Perhaps the db has to search through a lot of non-qualifying records to get to 9000. Has the query performance been analysed with `EXPLAIN`? – snakecharmerb May 07 '20 at 16:33
  • @snakecharmerb the database is used for data from Zabbix monitoring, we monitore availability of wifi points + a small number of other services (so wifi points are about of 90% of total number of monitored services), so the majority of alams are related to wifi and filter (```events.name LIKE```). Even if I make very small staep (```LIMIT 10```) if I have the last searched row - everything is slow. Also I do not understand what's the problem with ```count(distinct)``` query, why it is 30 seconds. I guess that the two facts are somhow related. I'll try with ```EXPLAIN```. Thank you for advice! – Artiom Kozyrev May 07 '20 at 16:59
  • @snakecharmerb well I have 10 493 218 rows in events table can it explain why the problem exists :) Can Temporary table as fast solution or Partition as long term solution help in the case? – Artiom Kozyrev May 07 '20 at 17:16
  • 1
    I'm not good enough at SQL to say. But I'd consider indexing at least `events.name`, maybe even a compound index over `name`, `value` and `clock` if they aren't already indexed. On a copy of the db first though, to avoid the risk of locking up the production db :) Good luck! – snakecharmerb May 07 '20 at 17:21
  • @snakecharmerb I'll try different ways to improve it! – Artiom Kozyrev May 07 '20 at 18:13

1 Answers1

1

OFFSET sucks performance.

A better way is to "remember where you left off".

Discussion: http://mysql.rjweb.org/doc.php/pagination

Why slower than reading all?

When using OFFSET, the query first counts off the number of rows given by the OFFSET, then delivers the number of rows given by the LIMIT. So, it gets slower and slower. The final offset takes about the same amount of time as reading the entire table.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Unfortunately it is no the case of the problem in the case, seems like it is not query issue. I guess db need more resources. – Artiom Kozyrev May 28 '20 at 11:21
  • @ArtiomKozyrev - No. Whereas you might be able to throw enough hardware resourses at the problem, that is _not_ a viable way to scale the problem. As your data grows, the problem grows -- quadratically! (That's much worse than "linearly", but not as bad as "exponentially". – Rick James May 28 '20 at 22:05