0

There are several SO answers explaining the difference between the node-postgres (pg) Client and Pool classes. All these answers essentially say to use Pool for efficient use of multiple connections. But as far as I can tell, none say when you must use Client instead of Pool or when it is more advantageous to do so.

I'm developing a serverless solution using the Kysely query builder. Kysely has an adapter for pg, but this adapter only uses the Pool class. I can configure it with a single connection (max: 1) for use serverless, and things seem to work well enough. (I'm using Vercel and Supabase.)

When is it important to use Client instead of Pool in node-postgres? Are there any issues associated with using Pool with max: 1 serverless?

Joe Lapp
  • 2,435
  • 3
  • 30
  • 42
  • 1
    It's important to use a client only when you need to control that you get the same single client every time (and, probably, that nothing else is using that one client). – Bergi May 09 '23 at 20:02
  • 1
    Using a `Pool` with a maximum of 1 connection vs a single client vs a new client for every request only should make a difference regarding the number of idle open connections to the database and the (average) time it takes for a request handler to acquire the connected client. It also depends on how your "serverless" environment handles concurrent requests and how long it keeps your application open to receive the next request vs starting it from scratch. – Bergi May 09 '23 at 20:06
  • Thank you @Bergi. I guess multi-query transactions would fall into that first category. I'm not sure how to think about your comments on pooling 1 connection, as I can only see a difference in when the connection might be acquired. – Joe Lapp May 09 '23 at 21:35

1 Answers1

2

When is it important to use Client instead of Pool in node-postgres?

In the case of serverless functions, you should probably always use the Pool. However, sometimes you can borrow ("checkout") a Client from the pool, for:

const myId = "20";
const client = await pool.connect(); //  borrow a client from the pool
try {
} catch (e) {
    const result = await client
      .query('SELECT * FROM users WHERE id = $1', [myId]);
} finally {
    client.release(); //  return the client
}

A Pool with size of 1 would act just like a client. Using a pool from the start would allow you to run multiple SQL queries in parallel in the same executing function in the future, since each client can only run 1 query at a time.

// JS will wait for one before trying to get two. 
// If the query for two isn't affected by one, why await.
const one = await client.select()...
const two = await client.select()...

// Looks like the requests can be concurrent... 
// but if pool uses `max:1` or a client is used, it still won't be.
const onePromise = client.select()...
const twoPromise = client.select()...
const [one, two] = Promise.all([onePromise, twoPromise]);

// Concurrent requests using Pool
// `max:1` or a client is used, it still won't be.
const onePromise = pool.select()...
const twoPromise = pool.select()...
const [one, two] = Promise.all([onePromise, twoPromise]);

Are there any issues associated with using Pool with max: 1 serverless?

Yes. In some cases, might be slow. If you're running queries in parallel, a pool with max: 1 or a single client has the same issue: you have to wait for the first query to finish before starting the second.

When to limit serverless concurrency or pool size

In rare cases with huge demand and therefore more serverless functions running simultaneously, you might exhaust postgres's max client count ("The default is typically 100 connections.").

Even if you controlled the max concurrent serverless functions to be 100, each function might make more than 1 client (if the pool size is > 1). This would happen if you run multiple queries in parallel in the same function. This means at max_connections=100 and for example, 5 parallel queries from the same worker, you'll hit the limit even with 20 concurrent functions.

Either (or do all of them):

  • reduce the max pool size, (might slow down your function)
  • reduce the worker concurrency, (will make your users/clients wait for a function to be freed)
  • handle more client connections on Postgres
  • try Neon ("serverless" postgres) connection pooling:

To overcome this limitation, Neon supports connection pooling using PgBouncer, allowing Neon to support up to 10000 concurrent connections.

Useful resources

Ben Butterworth
  • 22,056
  • 10
  • 114
  • 167
  • Wow. Thanks for the thorough response addressing what I need to be doing rather than what I thought I should be doing. You sent me googling and searching for quite a bit. It appears to me that in serverless, each function has its own context, allowing one pool per function (not per server). A connection cannot outlast a call, and Node is single-threaded, so the pool will never use more than one connection. The benefit to pooling would therefore be reusing the same (initially closed) connection *object* in each function call. `max: 1` would therefore be redundant. Am I grokking this? – Joe Lapp May 26 '23 at 00:59
  • I'm also reading that serverless functions must close connections at completion. Node-postgres removes the cached connection object when the connection is closed. It seems that no connections get pooled, so I'd be wasting resources pooling them (on the client). Now I'm back to thinking that it is wise to create connections directly rather than pool them serverless, and I don't know what to make of your response. "When a function is invoked, a connection to the database is opened. Upon completion, the connection is closed." https://vercel.com/guides/connection-pooling-with-serverless-functions – Joe Lapp May 26 '23 at 01:17
  • I'm glad you found it useful. `A connection cannot outlast a call, and Node is single-threaded, so the pool will never use more than one connection.`. One function can use many clients if you run multiple independent queries in concurrently. Even if Node is single-threaded, using a pool means it can make all the network requests to the database immediately (albeit sequentially). It doesn't have to wait for a response for the first query before running the second one `req(A),req(B),res(A),res(B).`. If you had 1-client/`max:1`, it does `req(A),res(A),req(B),res(B)`. – Ben Butterworth May 26 '23 at 06:01
  • `The benefit to pooling would therefore be reusing the same (initially closed) connection object in each function call. max: 1 would therefore be redundant.` You don't re-use anything, since it's a different instantion of the function. You can't share variables/instances between different functions. – Ben Butterworth May 26 '23 at 06:02
  • Oh, you might find https://neon.tech/blog/quicker-serverless-postgres as interesting as I do – Ben Butterworth May 26 '23 at 07:09
  • Your Neon link says, "we can’t use standard client-side database connection pooling". Like many sites, it recommends pgBouncer. So I still don't get what you're saying. As I understand things, because each function must close the connection it opens before returning, and because node-postgres only pools open connections, not just connection objects, there can be no reuse between functions, not even concurrently. The open connections in a pool wouldn't be reusable; any function might be waiting on a response on that connection. Pooled connections can't be reused, making pooling superfluous. – Joe Lapp May 26 '23 at 11:16
  • 1
    Pooled connections can't be reused *between functions*, but a pool can be useful in the same function in some cases: I updated my answer before your reply to show when that would be useful. See second code snippet – Ben Butterworth May 26 '23 at 11:55
  • Got it! Yes, that makes sense. Now help me make sense of your first statement, "In the case of serverless functions, you should probably always use the Pool." It seems to me that very few functions would be simultaneously opening multiple connections. – Joe Lapp May 26 '23 at 15:50
  • Sure, in my case, I guess I'm just thinking ahead - I might have to update that sentence since it might bring more issues than it solves. I would instantiate the Client/Pool in the same place for any endpoint (e.g. in `createContext` of tRPC). Some endpoints need SQL transactions/prepared statements, but I don't want to create both `Pool` and `Client`, and decide between them. Since you can get a Client from a Pool, it's easy to always create a Pool in `createContext`. – Ben Butterworth May 26 '23 at 16:03
  • 1
    In fact, my suggestion might be wasteful. Perhaps multiple clients will be instantiated as part of the pool, taking up CPU time, and then just unused. Good point, and thanks! – Ben Butterworth May 26 '23 at 16:04