0

I'm trying to find the gaps in temporal data, but the problem is my data is not a fixed interval sequence, so I think I cannot use this solution: How to find gaps in sequential numbering in mysql?
My temporal data is in milliseconds (bigint), normally there is at least one value every second, so I was thinking in something like:

SELECT * 
FROM mytable AS t1
LEFT JOIN mytable AS t2 ON t2.time > t1.time AND (t2.time - t1.time) < 1000
WHERE t2.id IS NULL
ORDER BY t1.time ASC
LIMIT 100

But that query is taking forever (my table has more than 100M records).
Is there any faster solution? // MySQL version 5.7.14
I've also found this: MySQL / MariaDB: how to find gaps in timebased data? but is using multiple subselects which I think will be even slower

MCRE:

CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `time` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `TIME` (`time`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;

INSERT INTO test (time) VALUES 
(10), (30), (50), (60), (65), (100), (150), (200), (350), (1000), (2500), (2600), (2660), (2700), (2800), (3000), (3500), (5000), (5100), (5150);

http://sqlfiddle.com/#!9/a8f5a2/2/0

Enrique
  • 4,693
  • 5
  • 51
  • 71
  • You are comparing every row against every other row, try ordering t1 and joining t2 where t2.time > t1.time? Your WHERE clause also will not work as there is no `mytable` - if you mean t1 then it will never be null, maybe change to t2? – Luke Feb 11 '20 at 12:52
  • Also, as this kind of problem is considerably simplified in MySQL 8.0+, it's worth indicating your version – Strawberry Feb 11 '20 at 13:01
  • @Strawberry I've edited the query, running it again with order by and > condition is that what you were saying? – Enrique Feb 11 '20 at 13:27
  • What should the output of the query look like? Surely you don't want to see a separate row for every single missing number from `0` - `5150`!?! – Strawberry Feb 11 '20 at 13:47
  • The query from my MCRE (sqlfiddle) is working as expected, the output is the time before every gap. In my example "1000" will be an output because the next value is 2500 and there are more than 1000ms between them. The problem is the query is really slow for 100M of rows – Enrique Feb 11 '20 at 14:12
  • @Enrique Were you able to find any solution? I came across a similar problem. – Umer Mehmood Jul 13 '22 at 08:58

0 Answers0