13

Is it a valid PyMySQL operation to cursor.execute(…) multiple times before executing connection.commit(), or does connection.commit() need to occur after each execute statement for the results to be stored properly? The idea is to eliminate as many redundant statements as possible, as the process is long-running.

Code structure:

with connection.cursor() as cursor:
    …
    cursor.execute(SQLtype1, rowToInsert)

    cursor.execute(SQLtype2, otherToInsert)

    connection.commit() # does this account for both execute statements, or just the last?

I’ve reviewed the following:

PyMySQL execute/commit example, but with only one example having only one execute statement.

Python MySQLdb example, but with an example showing a commit after each execute statement

Python SQLite example, shows multiple execute statements before a commit, but it's uncertain whether SQLite handles differently

Note: As the SQL queries are different, executemany doesn’t appear to be an option.

ballade4op52
  • 2,142
  • 5
  • 27
  • 42

1 Answers1

16

No, that's the intended purpose of cursor.commit(). What you're describing is auto-committing, which may or may not be enabled for your database driver. Check with its documentation to be sure.

If your first query succeeds but the second one fails, you probably don't want your database to be left in a corrupt state where some rows were inserted but others were not. All the changes you perform are staged until you commit them to the database with cursor.commit(). This allows you to perform multiple queries at once and automatically roll back the changes if one of them fails.

Blender
  • 289,723
  • 53
  • 439
  • 496
  • 2
    If I understand you, you're saying I would be replicating the function of auto-committing by calling commit after each execute statement. However, it may be safer to use auto-commit to avoid potentially corrupted states. – ballade4op52 Jan 24 '18 at 22:46
  • 8
    @Phillip: Not quite. Pretend you run PayBuddy and userA sends userB $1,000. You can do this by first decreasing the balance of userA and then increasing the balance of userB. These two queries must *both* succeed or the entire transaction fails. You can't be left with userA missing $100 and userB not receiving it because of some error with your application. You should not commit the changes in between the two queries because that would momentarily leave your database in an inconsistent state. You want to *avoid* auto-committing if you have groups of queries that cannot be partially executed. – Blender Jan 24 '18 at 23:14
  • @Blender, thanks for your detailed answer... a follow up question please: is there a limit to the number of queries you can include in a single transaction? (hundreds / thousands?) .... likewise, is there a limit or range for how often commits should be executed in a given time? .... [RE: i'm scraping ~100k pdfs, w/ ~12 pages each, which will require ~4hrs of constant 'insert' statements over ~37 tables] ... thanks again! – greenhouse Jan 29 '22 at 07:02