6

I'm using a python for my backend API server. My backend connect to a DISTANT Postgres database using SQLALCHEMY (without flask-sqlalchemy because i need that connector even without flask context (for others specific tasks)

My trouble is : Sometime, my SQLConnector crash, saying :

(psycopg2.OperationalError) server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

This happends randomly. I tried to fix it using a flush mechanism, without success... (for the record, It's in a Docker)

What do I want ? Get a python software (sqlalchemy) grabbing data from a remote database (postgres) periodically, WITHOUT disconnecting randomly over the time

What did I try to fix it trying auto reconnect on error, and change the engine connection without success or proper code snippet

Where I am at ? I'm stuck, i can't figure out HOW to reproduce the bug WHEN I WANT (for faster test/debug process) not avid thoses annoying disconnect on my sql connector

(psycopg2.OperationalError) server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
 [SQL: 'INSERT INTO scan_oee (uniqid, id_kanban, created_at, finished, debug) VALUES (%(uniqid)s, %(id_kanban)s, %(created_at)s, %(finished)s, %(debug)s)'] [parameters: {'uniqid': '2a1c80df-e676-456c-9e8e-a1a32a4b4171', 'id_kanban': '1804033-A02-289', 'created_at': '2018-11-07T09:38:00.125942', 'finished': False, 'debug': True}] (Background on this error at: http://sqlalche.me/e/e3q8)
=== BackListener Exception ===
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "backlistener.py", line 126, in <module>
    raise e
  File "backlistener.py", line 121, in <module>
    backlistener.run()
  File "backlistener.py", line 112, in run
    self.route_data(n)
  File "backlistener.py", line 69, in route_data
    insert_oee_into_db(n, self.action_stop_oee)
  File "/usr/src/app/taktTime.py", line 109, in insert_oee_into_db
    db.add(new_scan)
  File "/usr/src/app/db/base.py", line 55, in __exit__
    self.session_factory.commit()
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 943, in commit
    self.transaction.commit()
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 467, in commit
    self._prepare_impl()
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 447, in _prepare_impl
    self.session.flush()
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2254, in flush
    self._flush(objects)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2380, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2344, in _flush
    flush_context.execute()
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 391, in execute
    rec.execute(self)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 556, in execute
    uow
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 181, in save_obj
    mapper, table, insert)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 830, in _emit_insert_statements
    execute(statement, multiparams)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
 [SQL: 'INSERT INTO scan_oee (uniqid, id_kanban, created_at, finished, debug) VALUES (%(uniqid)s, %(id_kanban)s, %(created_at)s, %(finished)s, %(debug)s)'] [parameters: {'uniqid': '2a1c80df-e676-456c-9e8e-a1a32a4b4171', 'id_kanban': '1804033-A02-289', 'created_at': '2018-11-07T09:38:00.125942', 'finished': False, 'debug': True}] (Background on this error at: http://sqlalche.me/e/e3q8)

And there is the Connector snippet I'm using

base.py

from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
from config.settings import SETTINGS
from psycopg2 import IntegrityError as PgIntegrityError
from sqlalchemy.exc import IntegrityError as SQLIntegrityError
from server.flaskutils import print_flush

engine = create_engine("postgresql+psycopg2://" + SETTINGS['POSTGRES_USER'] + ":" +
                       SETTINGS['POSTGRES_PASSWORD'] + '@' + SETTINGS['POSTGRES_HOST'] + ':' + SETTINGS['POSTGRES_PORT'] + '/' + SETTINGS['POSTGRES_DB'])
# use session_factory() to get a new Session
_SessionFactory = scoped_session(sessionmaker(bind=engine))

Base = declarative_base()


class Database():
    def __init__(self, auto_commit=False, create_all=False):
        self.auto_commit = auto_commit
        self.session_factory = session_factory()

        if create_all:
            Base.metadata.create_all(engine)


    def __enter__(self):
        return self.session_factory

    def __exit__(self, type, value, traceback):
        if self.auto_commit:
            self.session_factory.commit()
        self.session_factory.close()


def session_factory():
    # Base.metadata.create_all(engine)
    return _SessionFactory()

backlistener.py -> My file (NOT using flask, it's a deamon) from db.base import Database

def insert_oee_into_db(xid, ended):
with Database(auto_commit=True) as db:
    new_scan = ScanOEE(xid, ended)
    db.add(new_scan)

models.py

class ScanOEE(Base):
    __tablename__ = 'scan_oee'

    uniqid = Column(String(36), primary_key=True)
    id_kanban = Column(String(50), unique=False)
    created_at = Column(DateTime, unique=False)
    finished = Column(Boolean, unique=False)
    debug = Column(Boolean, unique=False)

    def __init__(self, name, finished):
        self.uniqid = str(uuid4())
        self.id_kanban = name
        self.created_at = datetime.datetime.utcnow().isoformat()
        self.finished = finished
        self.debug = SETTINGS['DEBUG_MODE']

    def __repr__(self):
        return '<ScanOEE %r (%s) %s %s>' % (self.id_kanban, self.uniqid, str(self.created_at), str(finished))

Any snippet or help to show me what's wrong, what's the good practice would be perfect

EDIT 1

docker logs --tail all kanban_postgres_1
LOG:  database system was interrupted; last known up at 2018-10-19 10:06:37 UTC
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  invalid record length at 0/422B1B8: wanted 24, got 0
LOG:  redo is not required
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down at 2018-10-19 10:21:33 UTC
LOG:  MultiXact member wraparound protections are now enabled
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection with an open transaction
LOG:  unexpected EOF on client connection with an open transaction
LOG:  database system was interrupted; last known up at 2018-10-23 12:29:06 UTC
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  invalid record length at 0/4419C98: wanted 24, got 0
LOG:  redo is not required
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  database system was interrupted; last known up at 2018-10-23 12:53:43 UTC
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/4419D08
LOG:  invalid record length at 0/4422358: wanted 24, got 0
LOG:  redo done at 0/44213E0
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  incomplete startup packet
LOG:  invalid length of startup packet
LOG:  invalid length of startup packet
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  incomplete startup packet
LOG:  invalid length of startup packet
LOG:  invalid length of startup packet
LOG:  invalid length of startup packet
LOG:  invalid length of startup packet
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not send data to client: Broken pipe
STATEMENT:  SELECT scan_sm.uniqid AS scan_sm_uniqid, scan_sm.id_kanban AS scan_sm_id_kanban, scan_sm.created_at AS scan_sm_created_at, scan_sm.pos_x AS scan_sm_pos_x, scan_sm.pos_y AS scan_sm_pos_y, scan_sm.pos_z AS scan_sm_pos_z, scan_sm.seconds_since_last_scan AS scan_sm_seconds_since_last_scan, scan_sm.proceed_at AS scan_sm_proceed_at, scan_sm.state AS scan_sm_state 
        FROM scan_sm 
        WHERE scan_sm.proceed_at IS NOT NULL AND scan_sm.state = 'SUPERMARKET_FAST' ORDER BY scan_sm.created_at DESC
FATAL:  connection to client lost
STATEMENT:  SELECT scan_sm.uniqid AS scan_sm_uniqid, scan_sm.id_kanban AS scan_sm_id_kanban, scan_sm.created_at AS scan_sm_created_at, scan_sm.pos_x AS scan_sm_pos_x, scan_sm.pos_y AS scan_sm_pos_y, scan_sm.pos_z AS scan_sm_pos_z, scan_sm.seconds_since_last_scan AS scan_sm_seconds_since_last_scan, scan_sm.proceed_at AS scan_sm_proceed_at, scan_sm.state AS scan_sm_state 
        FROM scan_sm 
        WHERE scan_sm.proceed_at IS NOT NULL AND scan_sm.state = 'SUPERMARKET_FAST' ORDER BY scan_sm.created_at DESC
LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection with an open transaction
LOG:  unexpected EOF on client connection with an open transaction
LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection with an open transaction
LOG:  unexpected EOF on client connection with an open transaction
LOG:  unexpected EOF on client connection with an open transaction
WARNING:  worker took too long to start; canceled
WARNING:  autovacuum worker started without a worker entry
WARNING:  worker took too long to start; canceled
WARNING:  autovacuum worker started without a worker entry
LOG:  could not receive data from client: Connection reset by peer
LOG:  invalid length of startup packet
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  database system was interrupted; last known up at 2018-11-02 12:40:55 UTC
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/4D4FD10
LOG:  invalid record length at 0/4D7A8F8: wanted 24, got 0
LOG:  redo done at 0/4D7A840
LOG:  last completed transaction was at log time 2018-11-02 12:44:43.914143+00
LOG:  MultiXact member wraparound protections are now enabled
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
LOG:  database system was interrupted; last known up at 2018-11-02 13:16:17 UTC
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  invalid record length at 0/4D7A968: wanted 24, got 0
LOG:  redo is not required
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  could not receive data from client: Connection reset by peer
LOG:  invalid length of startup packet
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  database system was interrupted; last known up at 2018-11-05 10:36:02 UTC
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  invalid record length at 0/4DF53E0: wanted 24, got 0
LOG:  redo is not required
LOG:  MultiXact member wraparound protections are now enabled
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
LOG:  could not receive data from client: Connection reset by peer
LOG:  database system was interrupted; last known up at 2018-11-05 14:15:34 UTC
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  invalid record length at 0/4E1CAE0: wanted 24, got 0
LOG:  redo is not required
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Connection reset by peer
Skapin
  • 318
  • 2
  • 11
  • First thing to do: look in the postgres logs. – wildplasser Nov 07 '18 at 11:42
  • @wildplasser Thanks for your esponse, I edited my post adding the postgres log. Looks like Log are not help full :/ – Skapin Nov 08 '18 at 07:47
  • Looks like your network connection is interrupted. Is there any firewall/NAT box/software inbetween? Maybe first try to eliminate some men-in-the-middle (such as docker...) – wildplasser Nov 08 '18 at 08:50
  • @wildplasser I did suspect that too. it's a really bad "wifi" connection, so it can be that. – Skapin Nov 08 '18 at 09:41
  • How can I overcome this issue by using some reconnection pattern ? Do my connectio snippet looks bad or okay to you ? Can it be a MAX_CONNECTION reach on something ? thanks a lot by the way – Skapin Nov 08 '18 at 09:42
  • 1
    Interesting, I saw something similar recently. The logs contain nothing except that the other thinks that the other reset the connection. This happens inside a docker stack, on the same host, so each time it tries to retrieve data, it fails and then resets the connection. Quite annoying. – Matthieu Brucher Apr 01 '19 at 13:39
  • 1
    @Skapin Hi, skapin, had you solve the problem? As had a similar problem [how to fix “OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly”](https://stackoverflow.com/questions/55457069/how-to-fix-operationalerror-psycopg2-operationalerror-server-closed-the-conn), can you share the silution? – Riki Apr 03 '19 at 08:57
  • No I haven't.... Still mad at this. My guess is because the connection timeout after a long period. – Skapin Apr 03 '19 at 09:52

1 Answers1

6

@skapin, I had solve the problem.As the server connect sql need a paramteter pool_pre_ping. how to fix “OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly”

from flask_sqlalchemy import SQLAlchemy as _BaseSQLAlchemy

class SQLAlchemy(_BaseSQLAlchemy):
    def apply_pool_defaults(self, app, options):
        super(SQLAlchemy, self).apply_pool_defaults(self, app, options)
        options["pool_pre_ping"] = True

db = SQLAlchemy()
junxian diao
  • 573
  • 1
  • 6
  • 15