2

What can I commit entries to an SQL database with SQLAlchemy with tolerance for errors? Committing a large batch of entries together is much more efficient, but if there is an error in one of the entries, e.g. text in an integer column, the entire batch cannot be saved to the database. My workaround below commits entries individually, but this method can create too many connections to the mysql server, particularly when run in parallel. Is there a more efficient way to commit entries as a batch with room for error?

def commitentry(database, enginetext, verbose = False):
    """
    Takes a database object and text string that defines the SQL
    engine and adds all entries in the database list to the SQL
    database.
    """

    engine = create_engine(enginetext)
    Session = sessionmaker()
    Session.configure(bind=engine)
    session = Session()
    counter = 0
    for entry in database:
        try:
            session.add(entry)
            session.commit()

        except Exception, e:
            print("Commit Error")
            session.rollback()


            if verbose:
                print(e)

        finally:
            counter += 1
            if verbose:
                print(counter, counter/float(len(database)))

    if verbose:
        print("Entries saved!")
    session.close()
Michael
  • 13,244
  • 23
  • 67
  • 115
  • Please rephase your question. As "Pythonic" is a subjective idea, it's very difficult to understand what you're asking. When someone asks what is the "Pythonic" way to do something, that typically means they are looking for the "best" way to do something. [There are no silver bullets](http://en.wikipedia.org/wiki/No_Silver_Bullet), and thus what is "best" for you is not "best" for someone else. If your code works, and you are looking for more general feedback on your code, consider [coderview.stackexchange.com](http://codereview.stackexchange.com). – Mark Hildreth Jan 22 '14 at 01:39
  • Question updated, not really looking for review of code so much as a way to commit in batches. – Michael Jan 22 '14 at 23:11
  • One possible duplicate http://stackoverflow.com/q/18708050/2003420 – El Bert Jan 22 '14 at 23:20
  • @bvidal Pretty sure that will crash if there is an error in one of the entries. – Michael Jan 22 '14 at 23:31
  • What kind of errors are you talking about? I think you should differentiate between them. – flup Jan 28 '14 at 00:05

1 Answers1

6

I don't think you are looking at the right direction. As far as I know you cannot avoid committing batches without a rollback happening for the whole batch, when there is an error in a single entry.

You should try to catch the error in your code before adding to the session, i.e

batch_size = 500 
for i, entry in enumerate(database_list):
    try:
        validate(entry)
        #your custom function that validates the entry,
        #throws ValidationError on error and/or tries to 'fix' the entry
        session.add(entry)
    except ValidationError:
        pass
    if (i + 1) % batch_size == 0:
        #commit every `batch_size` entries
        session.commit()

Lastly, you may want to use insert() instead of the session API if your bulk insert is taking too long.

dtheodor
  • 4,894
  • 3
  • 22
  • 27
  • Can you explain the difference between using `insert()` and using `session`? – Michael Jan 30 '14 at 07:50
  • Its explained very well at http://docs.sqlalchemy.org/en/rel_0_8/faq.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow , with examples of how to use both methods. – dtheodor Jan 30 '14 at 15:44
  • won't this leave the last batch uncommitted unless it is exactly the size of batch_size? – cdonner May 02 '22 at 19:22