1

I have a large table with more than 200,000 rows that I only need to check the last few thousand rows for duplicates (not all) before I insert a new row into. Currently I'm running this query for each row I want to add:

SELECT ID from table where date='' and time=''

And based on the response from that query I write the row if the response is empty.

The issue I have with doing this is that it takes a very long time, and as the database grows this only increases how long it takes.

I tried using LIMIT and OFFSET by saying SELECT ID from table where date='' and time='' limit 200000,18446744073709551615 which I thought would only search through rows after 200,000 to the end of the database however running this query doesn't seem to be any faster.

My question is this: Is there a more efficient way to "skip ahead" in the database and only search a portion of the rows instead of all of the rows?

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
ez4nick
  • 9,756
  • 12
  • 37
  • 69

1 Answers1

2

You should be using INSERT IGNORE, and using a UNIQUE constraint on the table based on the columns that should be unique.

When using INSERT IGNORE, MySQL will automatically detect if the row is unique, and ignore the entry into the database. See this question for more information.

Additionally, searching a multi-million row database should be fast as long as you have the correct indexes on the table. You should not need to search a sub-set of data (Without keys, the database will be forced to do a row-scan, which could cause the delays you're talking about).

Blue
  • 22,608
  • 7
  • 62
  • 92
  • I'm searching for documentation on `INSERT IGNORE` and haven't yet found how it compares for duplicates. Does it check every column? – ez4nick Jan 24 '18 at 15:32
  • @ez4nick It compares for duplicates based on UNIQUE indexes (Remember, 1 index can cover multiple columns on the table). Add an index on the table that encompasses all the data that should be unique. – Blue Jan 24 '18 at 15:33