7

I encountered the following error after inserting 95 rows of video metadata out of 4000. sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/3o7r).

Base

engine = db.create_engine(connect_string)
con = engine.connect()
_SessionFactory = sessionmaker(bind=engine)
Base = declarative_base()

def session_factory():
    Base.metadata.create_all(engine)
    return _SessionFactory()

Video object

class Video(Base):

    __tablename__ = 'video'
    video_to_person = Table('video_to_person', Base.metadata,
                            Column('video_id', String, ForeignKey('video.vid')),
                            Column('person_id', Integer, ForeignKey('person.id'))
                            )

    _vid = Column("vid",String, primary_key=True)
    _webpage_url = Column("webpage_url", String)
    _upload_date = Column("upload_date", Date)
    _uploader = Column("uploader", String)
    _view_count = Column("view_count", DECIMAL)
    _like_count = Column("like_count", DECIMAL)
    _dislike_count = Column("dislike_count", DECIMAL)
    _format = Column("format", String)
    _duration = Column("duration", DECIMAL)
    _title = Column("title", String)
    _description = Column("description", String)
    persons = relationship("Person", secondary=video_to_person)

Video repository:

class VideoRepository():

    def create_video(self, vid: Video):
        session = session_factory()
        session.add(vid)
        session.commit()
        session.close()

How can I improve the connection management?

UPDATE: Thanks for the responses so far. One of the challenges is that all my model classes e.g. (class Video) inherit from Base. Base always creates a new engine object. I will look into further refactoring.

wuz
  • 483
  • 3
  • 16

1 Answers1

13

As explained in the doc of that particular error, you get this error because your application exceeded the number of connections that it is allowed to open/use in parallel, according to the limit set on the create_engine call. In this case, it's using the default, thus pool_size=5, max_overflow=10 and pool_timeout=30.

Which means that with a single engine, you can use up to 15 concurrent connections to the database, when that limit is reached, once a new request comes in to instantiate a new connection, it will wait 30 seconds and if none of the 15 already-established connections got released in the meantime, it raises the error.

As explained in the doc, this could have different causes:

  • The application is fielding too many concurrent requests to do work based on the configured value for the pool
  • The application is not returning connections to the pool
  • The application is attempting to run long-running transactions
  • The application is deadlocking

Given the information provided, my guess is that either:

  • You use too much threads (i.e >>15) and at some point your engine isn't able to provide new connections

  • You use limited number of threads (maybe 16) and you have a deadlock in your code.

My recommendation:

  • Check your slow query logs and look for queries that locked the table for a long period

  • Refactor the code to avoid calling [Base.metadata.create_all][2](engine) every time you get a session. This thing is usually called when an application starts, not every time you insert a record. If you need it there, set at least checkfirst=True so that it won't fire CREATE TABLE statement. This could potentially be a source of deadlock

  • If possible, use bulk inserts. You'll get a massive performance boost for free and will make a better use of your connection pool.

  • Once you identify the root cause of you problem (and only after that), you can adapt the 2 parameters pool_size and max_overflow. You can easily increase pool_size to 15 or 25 and max_overflow to 15

bagerard
  • 5,681
  • 3
  • 24
  • 48
  • Thanks a lot. I guess the first problem is that engine and con are created for every model object because the model object inherits base. In Java I would simply use a singleton pattern approach with a db manager (.e.g https://stackoverflow.com/questions/40525545/single-database-connection-throughout-the-python-application-following-singleto) Unfortunately I cannot find any examples how to implement that with sqlalchemy. – wuz Sep 09 '19 at 08:59
  • How do you run this? Is it part of a web application? Or as a stand-alone script (e.g CLI) that you run multiple times? Please clarify this + how you deal with threading. Usually the engine is created once globally and the session factory is taking care of sharing it (that’s why it’s taking the engine as input) so at first sight it looks OK. – bagerard Sep 09 '19 at 11:23