I have a very basic application that downloads data from a server via a socket connection one record at a time and writes each record to a postgresql database. This is only used by me.
downloader = DataDownloader.Downloader()
con = psycopg2.connect(datebase='db', user='username', password='pswrd')
downloader.get_data(con)
con.close()
In the above code I pass the psycopg2 connection to an instance of a class which then does all the downloading and database updates.
The issue is that this process is extremely slow since within get_data()
the class only downloads one record at a time. I have started to change this to incorporate threading:
downloader = DataDownloader.Downloader()
records_to_download = 'abc'
thread1 = threading.Thread(target=downloader.get_data, args=(records_to_download))
thread1.start()
records_to_download = 'xyz'
thread2 = threading.Thread(target=downloader.get_data, args=(records_to_download))
thread2.start()
records_to_download = 'ghj'
thread3 = threading.Thread(target=downloader.get_data, args=(records_to_download))
thread3.start()
All i've done is once again create a single downloader
instance. Then create 3 threads which each runs get_data
by passing it a parameter specifying which records to download.
Instead of creating a psycopg2 connection and passing it to get_data
I have instead created the connection within get_data
so that each thread will have it's own connection.
My question is, is that sufficient and correct to ensure all the threads will 'play nice' with the database simultaneously? Each thread is still using the same username/password to access the database. I have ensured that each thread will be updating different records by passing it a records_to_download
parameter.
I have heard about connection pooling but is that overkill for a one user, multi-thread setup such as this? (I have never used connection pooling and my database knowledge is very basic,...so if this is not necessary then I would avoid it).
Lastly, is there a limit to how many threads I can use for this kind of setup before being forced to use a more robust approach such as connection pooling? Or is connection pooling required right from the get go?