0

I've created a scraping script which every hour go through different articles and add them to a database. However in this database table i've set unique key on date and title, so it wont add the same articles over and over. This however seem to great a weird effect where the ids on the news auto increment id seem to count up even though the news is not added? How can i fix this?

$insert_news = $con->prepare("INSERT IGNORE INTO news (`title`, `url`, `image_url`, `date`, `news_text`, `website_id`) VALUES (?, ?, ?, ?, ?, 2)");
$insert_news->bind_param("sssss", $hltv_title, $hltv_link, $hltv_img, $hltv_date, $full_text_dont);
$insert_news->execute();
Peter Pik
  • 11,023
  • 19
  • 84
  • 142
  • 1
    When an insert fails, `auto_increment` is discarded. MySQL actually tries to use it, fails, and forgets about the number (to put it simply). Even if it seems as bad or counter intuitive, it's actually **desired**. Databases work in concurrent environment and this number "dropping" has its reason. Basically you can't fix it because `auto_increment` **doesn't provide sequential numbers**. It just provides next **higher** number. Nothing bad happens if you leave it as is. – N.B. Apr 26 '15 at 22:26

1 Answers1

0

Possible duplicate of INSERT IGNORE increases auto increment counter even no record is added? and prevent autoincrement on MYSQL duplicate insert. If you want to prevent gaps in your auto incremented ID's, you should consider changing the logic. For example, first check if the record exists (using a SELECT-query) before attempting to insert it.

Community
  • 1
  • 1
Martijn
  • 5,491
  • 4
  • 33
  • 41
  • I did that at the beginning, but the another guy here told me that it was bad practice to make a select query before? – Peter Pik Apr 26 '15 at 22:17
  • What would happen if i remove ignore? – Peter Pik Apr 26 '15 at 22:17
  • Never check if record exists before inserting it. It's the worst thing you can do. Always insert and if insert fails - voila, you just prevented duplicates. Don't try to avoid gaps in autoincremented ids, it just leads to a world of pain, confusion and general WTF when really, really weird stuff starts to happen. – N.B. Apr 26 '15 at 22:27
  • The only reason I can think of not to do it is because of concurrency issues (an INSERT IGNORE is atomic, a SELECT + INSERT not). However, this can also be handled in code (lock, semaphore, mutex) and it just feels more natural to first check if something is allowed instead of relying on the errors when an insert fails. Perhaps you could explain why checking is the worst thing you can do? (I agree however that you should not try to avoid gaps in autoincremented ids, simply because gaps are not a problem.) – Martijn Apr 27 '15 at 06:57
  • Database takes care of data, and it's also the fastest system in the whole communication chain that can detect duplicates. Even if it "feels" natural, using locks and what not simply delegates the authority from database to the programming language if you opt to check if something exists first. But there's already system in place - prevent duplicates on database level via `unique` and just insert. You avoid using locks, you avoid false positives (due to concurrency) and you produce less code. You get all the benefits with 0 downsides. No need to make code complex for no reason. – N.B. Apr 27 '15 at 12:21