1

I have built a scraper using selenium in one docker container, while a database lives in a small Linode server.

Scraped data will then be inserted into postgres database on Linode.

Scraped data is stored in a List with a Dict format. List[Dict]

However, there are times where this error is presented to me when trying to insert the data.

Client log:

asyncpg.exceptions.ConnectionDoesNotExistError: connection was closed in the middle of operation

Server log:

LOG:  could not receive data from client: Connection reset by peer
LOG:  could not receive data from client: Operation timed out

I have tried numerous solutions from stackoverflow such as links:
Connection was closed in the middle of operation when accesing database using Python and also trying to set tcp timeout parameters on postgres side to:\

# - TCP settings -
# see "man tcp" for details

tcp_keepalives_idle = 300       # TCP_KEEPIDLE, in seconds;
                    # 0 selects the system default
tcp_keepalives_interval = 60        # TCP_KEEPINTVL, in seconds;
                    # 0 selects the system default
tcp_keepalives_count = 100      # TCP_KEEPCNT;

but to no avail.

Additionally, I have tried to log any errors in postgres itself but there doesn't seem to be any.

These are my log settings

# - Where to Log -

log_destination = 'stderr'      # Valid values are combinations of
                    # stderr, csvlog, syslog, and eventlog,
                    # depending on platform.  csvlog
                    # requires logging_collector to be on.

# This is used when logging to stderr:
logging_collector = on      # Enable capturing of stderr and csvlog
                    # into log files. Required to be on for
                    # csvlogs.
                    # (change requires restart)

# These are only used if logging_collector is on:
log_directory = 'pg_log'            # directory where log files are written,
                    # can be absolute or relative to PGDATA
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
                    # can include strftime() escapes
#log_file_mode = 0600           # creation mode for log files,
                    # begin with 0 to use octal notation
#log_rotation_age = 1d          # Automatic rotation of logfiles will
                    # happen after that time.  0 disables.
#log_rotation_size = 10MB       # Automatic rotation of logfiles will
                    # happen after that much log output.
                    # 0 disables.
#log_truncate_on_rotation = off     # If on, an existing log file with the
                    # same name as the new log file will be
                    # truncated rather than appended to.
                    # But such truncation only occurs on
                    # time-driven rotation, not on restarts
                    # or size-driven rotation.  Default is
                    # off, meaning append to existing files
                    # in all cases.

One theory I can come up with was that, some data takes longer to scrape and format which causes the connection to be reset. However, subsequent inserts would be successful.

Any help would be appreciated! Thanks

internn00b
  • 11
  • 3
  • "Additionally, I have tried to log any errors in postgres itself but there doesn't seem to be any." What is that stuff labelled "server log", then, if not the server log? – jjanes Feb 06 '23 at 15:32
  • @jjanes hi, what i meant was apart from those error that I had received, I thought there was more going on behind the scene from the terminal logs. So I setup my postgres to log it into a pg_logs folder. – internn00b Feb 07 '23 at 01:38
  • You should change the log_line_prefix to have it put the timestamp in the log file (and probably things like IP address as well) I'm wondering if those two messages were given one immediately following the other, or were separated in time. – jjanes Feb 07 '23 at 02:05
  • @jjanes Yes, the like what you said, those two messages were given together. I was manually monitoring both the client and server log. – internn00b Feb 07 '23 at 06:03
  • Where they from the same backend pid? Maybe some network glitch could cause two different connections to drop at the same time, but with different apparent reasons. I see "Connection reset by peer" a lot, when the client closes without explicitly saying "goodbye". But I am not familiar with "Operation timed out" in this context. – jjanes Feb 07 '23 at 21:31

0 Answers0