Postgres SSL SYSCALL error: EOF detected with python and psycopg

Question

Using psycopg2 package with python 2.7 I keep getting the titled error: psycopg2.DatabaseError: SSL SYSCALL error: EOF detected

It only occurs when I add a WHERE column LIKE ''%X%'' clause to my pgrouting query. An example:

SELECT id1 as node, cost FROM PGR_Driving_Distance(
  'SELECT id, source, target, cost 
     FROM edge_table
     WHERE cost IS NOT NULL and column LIKE ''%x%'' ',
  1, 10, false, false)

Threads on the internet suggest it is an issue with SSL intuitively, but whenever I comment out the pattern matching side of things the query and connection to the database works fine.

This is on a local database running Xubuntu 13.10.

After further investigation: It looks like this may be cause by the pgrouting extension crashing the database because it is a bad query and their are not links which have this pattern.

Will post an answer soon ...

Why the subquery? makes no sense to ne. – wildplasser Jun 09 '14 at 23:33 — wildplasser, Jun 09 '14 at 23:33
The subquery is for the PGR_DrivingDistance function. – Phil Donovan Jun 09 '14 at 23:36 — Phil Donovan, Jun 09 '14 at 23:36
famous last words: `Will post an answer soon ...` – tedder42 Apr 26 '16 at 23:39 — tedder42, Apr 26 '16 at 23:39
Sometimes SO did, make me laugh :D – swdev Oct 11 '16 at 09:27 — swdev, Oct 11 '16 at 09:27
@PhilDonovan Did you resolve this issue ? – Zoran Pandovski Jan 10 '17 at 16:36 — Zoran Pandovski, Jan 10 '17 at 16:36
@PhilDonovan don't leave us hanging! – Dominic Cabral Jan 17 '17 at 20:23 — Dominic Cabral, Jan 17 '17 at 20:23
No sorry, I tried an alternative means. – Phil Donovan Jan 21 '17 at 01:02 — Phil Donovan, Jan 21 '17 at 01:02

score 46 · Answer 1 · answered Jul 28 '20 at 09:16

The error: psycopg2.operationalerror: SSL SYSCALL error: EOF detected

The setup: Airflow + Redshift + psycopg2

When: Queries take a long time to execute (more than 300 seconds).

A socket timeout occurs in this instance. What solves this specific variant of the error is adding keepalive arguments to the connection string.

keepalive_kwargs = {
    "keepalives": 1,
    "keepalives_idle": 30,
    "keepalives_interval": 5,
    "keepalives_count": 5,
}

conection = psycopg2.connect(connection_string, **keepalive_kwargs)

Redshift requires a keepalives_idle of less than 300. A value of 30 worked for me, your mileage may vary. It is also possible that the keepalives_idle argument is the only one you need to set - but ensure keepalives is set to 1.

Link to docs on postgres keepalives.

Link to airflow doc advising on 300 timeout.

1

Amazing answer. Thanks. – Basil Musa Jul 09 '21 at 20:18

score 19 · Answer 2 · answered Mar 18 '16 at 13:58

19

I ran into this problem when running a slow query in a Droplet on a Digital Ocean instance. All other SQL would run fine and it worked on my laptop. After scaling up to a 1 GB RAM instance instead of 512 MB it works fine so it seems that this error could occur if the process is running out of memory.

answered Mar 18 '16 at 13:58

antonagestam

4,532
3
32
44

6

it appears that isn't always the fix- I'm using a machine with 160gb ram and still having this error while using `pg_dump` on a SSL-only database. only 15gb is in use. – tedder42 Apr 27 '16 at 00:08
Well this might work, but it doesn't sound like a real solution. There should be a way to optimise this somehow – gdvalderrama Jan 11 '17 at 09:53
1

Exactly what I've run into! Added 4GB of swap space to a 512MB instance and everything worked as a charm. – Max Malysh Mar 07 '17 at 04:27
My db has 16GB RAM dedicated, no swap of system is getting used but still running into this issue.. Does only happen in a small amount of queries... Weird. – gies0r Oct 31 '19 at 10:46
@gies0r This problem probably has more causes than memory issues, even though I wouldn't completely rule it out. – antonagestam Oct 31 '19 at 10:57

Ali Tlekbai · Answer 3 · 2021-03-07T11:43:55.167

I encountered the same error. By CPU, RAM usage everything was ok, solution by @antonagestam didn't work for me.

Basically, the issue was at the step of engine creation. pool_pre_ping=True solved the problem:

engine = sqlalchemy.create_engine(connection_string, pool_pre_ping=True)

What it does, is that each time when the connection is being used, it sends SELECT 1 query to check the connection. If it is failed, then the connection is recycled and checked again. Upon success, the query is then executed.

sqlalchemy docs on pool_pre_ping

In my case, I had the same error in python logs. I checked the log file in /var/log/postgresql/, and there were a lot of error messages could not receive data from client: Connection reset by peer and unexpected EOF on client connection with an open transaction. This can happen due to network issues.

score 10 · Answer 4 · answered Dec 18 '17 at 19:08

This issue occurred for me when I had some rogue queries running causing tables to be locked indefinitely. I was able to see the queries by running:

SELECT * from STV_RECENTS where status='Running' order by starttime desc;

then kill them with:

SELECT pg_terminate_backend(<pid>);

score 10 · Answer 5 · answered Oct 30 '18 at 18:55

Very similar answer to what @FoxMulder900 did, except I could not get his first select to work. This works, though:

WITH long_running AS (
    SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
    FROM pg_stat_activity
    WHERE (now() - pg_stat_activity.query_start) > interval '1 minutes'
      and state = 'active'
)
SELECT * from long_running;

If you want to kill the processes from long_running just comment out the last line and insert SELECT pg_cancel_backend(long_running.pid) from long_running ;

this helped me avoid the SSL SYSCALL error – dada May 04 '21 at 10:09 — dada, May 04 '21 at 10:09

score 8 · Answer 6 · answered Feb 01 '19 at 11:24

8

In my case that was OOM killer (query is too heavy)

Check dmesg:

dmesg | grep -A2 Kill

In my case:

Out of memory: Kill process 28715 (postgres) score 150 or sacrifice child

answered Feb 01 '19 at 11:24

papko26

91
1
2

3

To the uninitiated it's not really clear what you are saying. Please provide some explanation on what `dmesg` is and why you are running it. – Powertieke Feb 01 '19 at 11:39
1

this could be useful, `dmesg` is just where a lot of the linux kernel errors ends, usually these means driver's messages (e.g. I have been in dmesg looking for how to fix my wifi a lot of times). When Linux (and OSs in general) runs out of memory (and swap), then the kernel picks one of the current process and kills it in order to reclaim the memory. Note that at that point the OS has two options: kill one process or freeze forever. – toto_tico Feb 06 '19 at 08:17
Besides freeing more memory, what other solutions can be done. Can we do a timeout to let the user know? – jsibs Apr 30 '21 at 20:19

score 3 · Answer 7 · answered Apr 26 '17 at 09:42

3

I got this error running a large UPDATE statement on a 3 million row table. In my case it turned out the disk was full. Once I had added more space the UPDATE worked fine.

answered Apr 26 '17 at 09:42

Fiskabollen

81
5

score 2 · Answer 8 · answered Aug 02 '14 at 23:16

2

You may need to express % as %% because % is the placeholder marker. http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries

answered Aug 02 '14 at 23:16

piro

13,378
5
34
38

that's not what's causing their issue, but it is true for executing a query string from Python. – howMuchCheeseIsTooMuchCheese Jul 16 '21 at 11:01

Postgres SSL SYSCALL error: EOF detected with python and psycopg

8 Answers8

Linked