Fast way to check for duplicates in large sql table

Question

I have a large table with more than 200,000 rows that I only need to check the last few thousand rows for duplicates (not all) before I insert a new row into. Currently I'm running this query for each row I want to add:

SELECT ID from table where date='' and time=''

And based on the response from that query I write the row if the response is empty.

The issue I have with doing this is that it takes a very long time, and as the database grows this only increases how long it takes.

I tried using LIMIT and OFFSET by saying SELECT ID from table where date='' and time='' limit 200000,18446744073709551615 which I thought would only search through rows after 200,000 to the end of the database however running this query doesn't seem to be any faster.

My question is this: Is there a more efficient way to "skip ahead" in the database and only search a portion of the rows instead of all of the rows?

did you create index for the table? did you make your field UNIQUE? — Juan Carlos Oropeza, Jan 24 '18 at 15:29
You've been here long enough to know better how to use code blocks. — Joel Coehoorn, Jan 24 '18 at 15:33
A table with 200,000 rows is teeny tiny. MySQL should be able to search that almost instantly if you have proper indexing. — HLGEM, Jan 24 '18 at 15:33

Blue · Accepted Answer · 2018-01-24T15:31:59.047

2

You should be using INSERT IGNORE, and using a UNIQUE constraint on the table based on the columns that should be unique.

When using INSERT IGNORE, MySQL will automatically detect if the row is unique, and ignore the entry into the database. See this question for more information.

Additionally, searching a multi-million row database should be fast as long as you have the correct indexes on the table. You should not need to search a sub-set of data (Without keys, the database will be forced to do a row-scan, which could cause the delays you're talking about).

See this post for some additional ideas.
See also Avoiding Full Table Scans.

edited Jan 24 '18 at 15:31

answered Jan 24 '18 at 15:26

Blue

22,608
7
62
92

I'm searching for documentation on `INSERT IGNORE` and haven't yet found how it compares for duplicates. Does it check every column? – ez4nick Jan 24 '18 at 15:32
@ez4nick It compares for duplicates based on UNIQUE indexes (Remember, 1 index can cover multiple columns on the table). Add an index on the table that encompasses all the data that should be unique. – Blue Jan 24 '18 at 15:33

Fast way to check for duplicates in large sql table

1 Answers1