Mysql: Insert if row doesnt exist safely with key and unique attribute

Question

Background:

I built a scraper in python (not sure if that matters). I scrape the website and update my html table. The main table stores the autogenerated_id, url, raw_html, date_it_was_scrapped, last_date_the_page_was_updated (provided by the website). My table has many duplicate urls which it shouldnt so i am planning on making urls unique in the database.

Desired outcome:

I only want to insert a row if the url doesnt exist and update the html if last_date_the_page_was_updated > date_it_was_scrapped.

Solution:

The following stackoverflow post shows how.
I havent tested it because of the selected answers warning: INSERT ... ON DUPLICATE KEY UPDATE statement against a table having more than one unique or primary key is also marked as unsafe.

What I plan to do based on the stackoverflow question.

INSERT INTO html_table (url, raw_html, date_it_was_scrapped, last_date_the_page_was_updated)
VALUES (the data)
ON DUPLICATE KEY UPDATE
    url = VALUES(url), 
    raw_html = VALUES(raw_html),
    date_it_was_scrapped = VALUES(date_it_was_scrapped),
    last_date_the_page_was_updated=VALUES(last_date_the_page_was_updated)
WHERE last_date_page_was_update > date_it_was_scrapped

Question:

What is unsafe about it and is there a safe way to do it?

GMB · Accepted Answer · 2019-02-15T12:35:24.153

From the description of bug 58637, which is linked in the MySQL documentation page that flags the INSERT ... ON DUPLICATE KEY UPDATE as unsafe :

When the table has more than one unique or primary key, this statement is sensitive to the order in which the storage engines checks the keys. Depending on this order, the storage engine may determine different rows to mysql, and hence mysql can update different rows [...] The order that the storage engine checks keys is not deterministic.

I understand that your table has an autoincremented primary key, and that you are planning to add a unique key on the url column. Because the primary key is autoincremented, you will not pass it as a parameter for INSERT commands, as shown in your SQL command. Hence MySQL will not need to check for duplicate on this column ; it will only check for duplicates on url. As a consequence, this INSERT should be safe.

Mysql: Insert if row doesnt exist safely with key and unique attribute

Background:

Desired outcome:

Solution:

Question:

1 Answers1