0

I have a PostgreSQL 11 server with 8 cores. I am running a series of ~50+ queries in VBA using ADODB that pull down result sets into Excel. The queries take ~1 hour to run. I'd like to run these queries in parallel (through multiple connections) instead of serially.

My understanding is that each connection instance to PostgreSQL only uses 1 core max, so 7 of my 8 cores are not being used. I'd like to establish multiple connections to the PostgreSQL so I can leverage all 8 cores and speed up processing by having each connection run, say, 6 queries each in parallel instead of running all ~50 in serial.

I've looked into the async functions for the Postgres ODBC driver, and I am not sure that resolves the issue. I don't believe that establishes multiple connections.

Dim conn As ADODB.Connection
Set conn = New ADODB.Connection

conn.Open "DRIVER={PostgreSQL Unicode(x64)}" _
            & ";Server=" & Range("Server").Value _
            & ";Database=" & Range("Database").Value _
            & ";UID=" & Range("Username").Value _
            & ";PWD=" & Range("Password").Value _
            & ";Port=" & Range("Port").Value _
            & ";sslmode=require"

conn.CommandTimeout = 600

EDIT: I don't believe my bottleneck is my hard drive (I use an SSD). I tried running a single thread pgbench and got 300 tps. Running a four thread pgbench got 1,200 tps. So I think there's definitely untapped processing power here that I'd love to tap.

EDIT: @TimWilliams, I looked through that example, and it is super helpful, thanks! I have an additional question. That example just processes queries without returning a result set. If each of my ~50 queries returns results (like below), how does the async work?

sQRY = "select * from " & Chr(34) & "sFunction" & Chr(34) & "()"
rs.CursorLocation = adUseClient
rs.Open sQRY, conn, adOpenStatic, adLockReadOnly
Sheets("sFunction").Range("A1").CopyFromRecordset rs
rs.Close
  • I dont know about vba. But C# have some function for that. I made a question about it some time ago. https://codereview.stackexchange.com/questions/155253/splitting-a-list-of-numbers-into-multiple-lists And I dont think is true queries use only 1 CPU. Usually the bottleneck is the disk reading. – Juan Carlos Oropeza Jan 25 '19 at 15:26
  • Sorry if was a missunderstanding. Disk is just a factor, but usually the biggest one. As I show in my question link I do get better performance splitting my query into multiple threads because those query requiere heavy calculations. – Juan Carlos Oropeza Jan 25 '19 at 15:48
  • https://stackoverflow.com/questions/21933099/running-multiple-async-queries-with-adodb-callbacks-not-always-firing – Tim Williams Jan 25 '19 at 17:39
  • Thanks @TimWilliams - made clarifying edit above. – BadgerBadgerBadger Jan 26 '19 at 21:58
  • I can take a look but not until after the weekend. Also - I'd be using a different database so I'm not clear whatever I come up with would apply in your case. – Tim Williams Jan 26 '19 at 22:59

0 Answers0