I have built a scraper using selenium in one docker container, while a database lives in a small Linode server.
Scraped data will then be inserted into postgres database on Linode.
Scraped data is stored in a List with a Dict format. List[Dict]
However, there are times where this error is presented to me when trying to insert the data.
Client log:
asyncpg.exceptions.ConnectionDoesNotExistError: connection was closed in the middle of operation
Server log:
LOG: could not receive data from client: Connection reset by peer
LOG: could not receive data from client: Operation timed out
I have tried numerous solutions from stackoverflow such as links:
Connection was closed in the middle of operation when accesing database using Python
and also trying to set tcp timeout parameters on postgres side to:\
# - TCP settings -
# see "man tcp" for details
tcp_keepalives_idle = 300 # TCP_KEEPIDLE, in seconds;
# 0 selects the system default
tcp_keepalives_interval = 60 # TCP_KEEPINTVL, in seconds;
# 0 selects the system default
tcp_keepalives_count = 100 # TCP_KEEPCNT;
but to no avail.
Additionally, I have tried to log any errors in postgres itself but there doesn't seem to be any.
These are my log settings
# - Where to Log -
log_destination = 'stderr' # Valid values are combinations of
# stderr, csvlog, syslog, and eventlog,
# depending on platform. csvlog
# requires logging_collector to be on.
# This is used when logging to stderr:
logging_collector = on # Enable capturing of stderr and csvlog
# into log files. Required to be on for
# csvlogs.
# (change requires restart)
# These are only used if logging_collector is on:
log_directory = 'pg_log' # directory where log files are written,
# can be absolute or relative to PGDATA
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
# can include strftime() escapes
#log_file_mode = 0600 # creation mode for log files,
# begin with 0 to use octal notation
#log_rotation_age = 1d # Automatic rotation of logfiles will
# happen after that time. 0 disables.
#log_rotation_size = 10MB # Automatic rotation of logfiles will
# happen after that much log output.
# 0 disables.
#log_truncate_on_rotation = off # If on, an existing log file with the
# same name as the new log file will be
# truncated rather than appended to.
# But such truncation only occurs on
# time-driven rotation, not on restarts
# or size-driven rotation. Default is
# off, meaning append to existing files
# in all cases.
One theory I can come up with was that, some data takes longer to scrape and format which causes the connection to be reset. However, subsequent inserts would be successful.
Any help would be appreciated! Thanks