Which approach is best performance-wise for inserting non-duplicate records?

Question

I have a table which is going to hold logs from GPS devices in the following format:

id | device | time | latitude | longitude

Currently, from the size of the logs (they are text files), it appears there are going to be around 10 thousand records per day. Most probably it will increase to somewhere like 50 thousands

The table should not have any duplicates, and is going to be updated frequently, somewhere around every 30 seconds, and I'm worried about possible performance issues if I pick the wrong approach.

Here are a few scenarios I thought about:

Store all the records from the log file in a temporary table, then make a query like the one here with the two tables: https://stackoverflow.com/a/11767663/2857962
Make an unique index on the columns device, time, latitude and longitude, and use INSERT IGNORE INTO (the most obvious way)
Calculate a MD5 hash on the above 4 columns and compare it before sending to the database (there's a PHP app beneath this)

So my question is, what would be the most efficient way here to store such records in huge numbers very frequently?

Sure, it does. The problem is that I have to read a log file from another server, parse it and then figure out which log message is new and which isn't. — NorthBridge, May 23 '14 at 07:38
You're going to read the entire file every time, instead of keeping track of where you finished the last time? — Barmar, May 23 '14 at 07:40
In order to use `INSERT IGNORE` you have to create a unique index on the columns. — Barmar, May 23 '14 at 07:41
The log files are dynamic, since they come from another server - I can't make a diff or count the bytes read last time, or anything similar. — NorthBridge, May 23 '14 at 07:47

Which approach is best performance-wise for inserting non-duplicate records?

0 Answers0