4

I have a python script using executemany to bulk insert rows into a MySQL table. The data is retrieved from different APIs, so every now and then there is unexpected data which leads to a row causing exception.

If I understand correctly - when calling executemany with 1,000 rows and one of them is problematic - the entire bulk is not inserted.

I want to find a way to submit 1,000 records and successfully load the ones that are not problematic. So for example - if one of a thousand is problematic it will not be loaded, but all other 999 will be loaded.

What's the best practice on that? I'm thinking of catching an exception and creating a fallback to re-submit all 1000 one by one - but it seems like there must be a better way to achieve the same outcome.

Advice?

Aamit
  • 181
  • 4
  • 16
  • what exceptions are you getting? What makes you think a retry will work? – danblack Dec 04 '18 at 01:37
  • If for example only record 364 out of a batch of 1000 is problematic - I'd like to retry inserting all other 999 records individually. Because I don't know which record caused the exception, and I because one exception fails the whole batch - I'm thinking of retrying all 1000 as single statements (execute instead of execute many). An exception could be anything from an empty PK field, unexpected encoding, etc. – Aamit Dec 04 '18 at 02:33
  • yes retrying will narrow down the error to a minimum number of row(s). – danblack Dec 04 '18 at 02:39

2 Answers2

4

Doing an "INSERT OR IGNORE" statement at the beginning of your "executemany" query will let you do exactly this - it will only add the values that don't bring an error.

The only downside is that you no longer can see what error is happening. For example,

Original database:

('kaushik', 3)
('maria', 4)
('shreya', 38)

Query: (in python)

listofnames = [
('kaushik', 3),
('maria', 4),
('jane', 56)
]

c.executemany("INSERT OR IGNORE INTO bob (name, number) VALUES (?,?)", 
listofnames)

Final db:

('kaushik', 3)
('maria', 4)
('shreya', 38)
('jane', 56)
Alaska
  • 341
  • 2
  • 9
  • `INSERT IGNORE` worked for me, not sure if this was changed from `INSERT OR IGNORE INTO` or if it's used in a different situation – busterroni Oct 19 '22 at 21:10
1

When inserting, executemany batches all the data rows together and tries to insert them all with one command. As far as I know, there is no way to handle the exception raised by one failed insertion without ruining the entire batch of insertions. If one row fails, the whole command fails.

Here's what it looks like (example taken from the MySQL docs). If you tell it to do this:

data = [
  ('Jane', date(2005, 2, 12)),
  ('Joe', date(2006, 5, 23)),
  ('John', date(2010, 10, 3)),
]
stmt = "INSERT INTO employees (first_name, hire_date) VALUES (%s, %s)"
cursor.executemany(stmt, data)

executemany will do this:

INSERT INTO employees (first_name, hire_date)
VALUES ('Jane', '2005-02-12'), ('Joe', '2006-05-23'), ('John', '2010-10-03')

If you think this is going to be a rare occurrence, your idea of retrying each insertion individually will work. Something like:

try:
    cursor.executemany(stmt, data)
except ___Error:  # fill in the blank
    for datum in data:
        try:
            cursor.execute(stmt, datum)
        except ___Error:
            # handle exception, eg print warning
            ...

If you think this is going to be a frequent problem, then it will probably be more performant to forgo executemany and just do this:

for datum in data:
    try:
        cursor.execute(stmt, datum)
    except ___Error:
        # handle exception, eg print warning
        ...
ibonyun
  • 425
  • 3
  • 11