1

I am using Python 3.6.8 and have a function that needs to run 77 times. I am passing in data that is pulling data out of PostgreSQL and doing a statistical analysis and then putting back into the database. I can only run 3 processes at the same time due to one at a time takes way to long (about 10 min for each function call) and me only being able to have 3 DB connections open at one time. I am trying to use the Poll library of Multiprocessing and it is trying to start all of them at once which is causing a too many connections error. Am i using the poll method correctly, if not what should i use to limit to ONLY 3 functions starting and finishing at the same time.


def AnalysisOf3Years(data):
    FUNCTION RAN HERE
######START OF THE PROGRAM######
print("StartTime ((FINAL 77)): ", datetime.now())
con = psycopg2.connect(host="XXX.XXX.XXX.XXX", port="XXXX", user="USERNAME", password="PASSWORD", dbname="DATABASE")
cur = con.cursor()
date = datetime.today()
cur.execute("SELECT Value FROM table")
Array = cur.fetchall()
Data = []
con.close()
for value in Array:
    Data.append([value,date])
p = Pool(3)
p.map(AnalysisOf3Years,Data)
print("EndTime ((FINAL 77)): ", datetime.now())

2 Answers2

1

You can use SQLAlchemy Python package that has database connection pooling as a standard functionality.

It does work with Postgres and many other database backends.

engine = create_engine('postgresql://me@localhost/mydb',
                       pool_size=3, max_overflow=0)

pool_size max number of connections to the database. You can set it to 3.

This page has some examples how to use that with Postgres -

https://docs.sqlalchemy.org/en/13/core/pooling.html

Based on your use case you might be also interested in SingletonThreadPool

https://docs.sqlalchemy.org/en/13/core/pooling.html#sqlalchemy.pool.SingletonThreadPool

A Pool that maintains one connection per thread.

Tagar
  • 13,911
  • 6
  • 95
  • 110
1

It appears you only briefly need your database connection, with the bulk on the script's time spent processing the data. If this is the case you may wish to pull the data out once and then write the data to disk. You can then load this data fresh from disk in each new instance of your program, without having to worry about your connection limit to the database.

If you want to look into connection pooling, you way wish to use pgbouncer. This is a separate program that sits between your main program and the database, pooling the number of connections you give it. You are then free to write your script as a single-threaded program, and you can spawn as many instances as your machine can cope with.

It's hard to tell why your program is misbehaving as the indentation is appears to be wrong. But at a guess it would seem like you do not create an use your pool in side a __main__ guard. Which, on certain OSes could lead to all sorts of weird issues.

You would expect well behaving code to look something like:

from multiprocessing import Pool

def double(x):
    return x * 2

if __name__ == '__main__':
    # means pool only gets created in the main parent process and not in the child pool processes
    with Pool(3) as pool:
        result = pool.map(double, range(5))
    assert result == [0, 2, 4, 6, 8, 10]
Dunes
  • 37,291
  • 7
  • 81
  • 97