1

I have a db and client app that does reads and writes, I need to handle a lot of concurrent reads but be sure that writes get priority, while also respecting my db’s connection limit.

Long version:
I have a single instance pgSQL database which allows 100 connections. My .net microservice uses Npgsql to connect to the db. It has to do read queries that can take 20-2000ms and writes that can take about 500-2000ms. Right now there are 2 instances of the app, connecting with the same user credentials. I am trusting Npgsql to manage my connection pooling, and am preparing my read queries as there are basically just 2 or 3 variants with different parameter values.

As user requests increased, I started having problems with the database’s connection limit. Errors like ‘Too many connections’ from the db.

To deal with this I introduced a simple gate system in my repo class:

private static readonly SemaphoreSlim _writeGate = new(20, 20);
private static readonly SemaphoreSlim _readGate = new(25, 25);

public async Task<IEnumerable<SomeDataItem>> ReadData(string query, CancellationToken ct)
{
   await _readGate.WaitAsync(ct);
   // try to get data, finally release the gate
   _readGate.Release();
}

public async Task WriteData(IEnumerable<SomeDataItem>, CancellationToken ct)
{
   await _writeGate.WaitAsync(ct);
   // try to write data, finally release the gate
   _writeGate.Release();
}

I chose to have separate gates for read and write because I wanted to be confident that reads would not get completely blocked by concurrent writes. The limits are hardcoded as above, a total of limit of 45 on each of the 2 app instances, connecting to 1 db server instance. It is more important that attempts to write data do not fail than attempts to read. I have some further safety here with a Polly retry pattern.

This was alright for a while, but as the concurrent read requests increase, I see that the response times start to degrade, as a backlog of read requests begins to accumulate.

So, for this question, assume my sql queries and db schema are optimized to the max, what can I do to improve my throughput?

I know that there are times when my _readGate is maxed out, but there is free capacity in the _writeGate. However I don’t dare reduce the hardcoded limits because at other times I need to support concurrent writes. So I need some kind of QoS solution that can allow more concurrent reads when possible, but will give priority to writes when needed.

Queue management is pretty complicated to me but is also quite well known to many, so is there a good nuget package that can help me out? (I’m not even sure what to google)
Is there a simple change to my code to improve on what I have above?
Would it help to have different conn strings / users for reads vs writes?
Anything else I can do with npgsql / connection string that can improve things?

I think that postgresql recommends limiting connections to 100, there's a SO thread on this here: How to increase the max connections in postgres? There's always a limit to how many simultaneous queries that you can run before the perf would stop improving and eventually drop off. However I can see in my azure telemetry that my db server is not coming close to fully using cpu, ram or disk IO (cpu doesn't exceed 70% and is often less, memory the same, and IOPS under 30% of its capacity) so I believe there is more to be squeezed out somewhere :)

Maybe there are other places to investigate, but for the sake of this question I'd just like to focus on how to better manage connections.

  • 1
    Where did you hear of the 100 connections limit? It's a default, but that's it. – Frank Heikens May 04 '22 at 19:49
  • 1
    "assuming my sql queries and db schema are optimized to the max" Start by questioning your assumptions. – jjanes May 04 '22 at 20:22
  • "However I can see that my db server is not coming close to fully using cpu, ram or disk IO" How do you see that? Like, what tool are you using and what is it showing? For disk, are you looking at disk throughput alone, or also random IOPS capacity? – jjanes May 04 '22 at 20:26
  • Edited the post in response to comments – Ian Ferguson May 05 '22 at 07:16
  • PostgreSQL doesn't recommend a connection limit of 100. It has a default of 100, but it's up to you and your (virtual) hardware to select the best setting for your environment. I think your queries are slow and that's why the database doesn't have any connections left for new processes. But that's just a wild guess because you didn't show us any query at all, nor results from EXPLAIN(ANALYZE, VERBOSE, BUFFERS). One of my databases has 12 active connections and handles 400 million transactions per day, for a couple of million customers. Never needed 100 connections – Frank Heikens May 05 '22 at 09:55
  • Fair comment, I just wanted to avoid getting into the query here for just to focus on conn mgmt, and the question was long enough already:) It's timeseries data using Timescaledb, there's a few TB on disk, and the queries are pulling quite a lot of rows for single series. I'm pretty confident with my raw data queries, but have some doubts about the aggregates, looking into writing a PGX extension to improve this. – Ian Ferguson May 05 '22 at 20:45

1 Answers1

1

First, if you're getting "Too many connections" on the PostgreSQL side, that means that the total number of physical connections being opened by Npgsql exceeds the max_connection setting in PG. You need to make sure that the aggregate total of Npgsql's Max Pool Size across all app instances doesn't exceed that, so if your max_connection is 100 and you have two Npgsql instances, each needs to run with Max Pool Size=50.

Second, you can indeed have different connection pools for reads vs. writes, by having different connection strings (a good trick for that is to set the Application Name to different values). However, you may want to set up one or more read replicas (primary/secondary setup); this would allow all read workload to be directed to the read replica(s), while keeping the primary for write operation only. This is a good load balancing technique, and Npgsql 6.0 has introduced great support for it (https://www.npgsql.org/doc/failover-and-load-balancing.html).

Apart from that, you can definitely experiment with increasing max_connection on the PG side - and accordingly Max Pool Size on the clients' side - and load-test what this do to resource utilization.

Shay Rojansky
  • 15,357
  • 2
  • 40
  • 69
  • Thanks, if I do create separate read/write users, but still have a single instance db, can I still achieve shared total connections with writes getting priority? Can pgbouncer help here? Thought it was not needed if Npgsql is managing my conns. I'd hoped to avoid replicas (my lack of competence), but setting read replicas sounds like a good reason to look into it. – Ian Ferguson May 05 '22 at 20:44
  • What do you mean by "achieve shared total connections"? If you have separate read/write users, then connections with those users are pooled separately. This guarantees you always have X write connection available by keeping them reserved for writes only - but then they're not shared (two separate pools). – Shay Rojansky May 06 '22 at 08:26
  • pgbouncer's main advantage is allowing you to pool centrally in one place - Npgsql's pool is in-process, so each application instance has its own. It's also possible to use both - a per-instance pool in Npgsql, plus a single pg_bouncer in front of your PG. However, that generally won't change the read/write mechanics we're discussing here - a pool is a pool. – Shay Rojansky May 06 '22 at 08:28
  • By shared connections I meant that the read client would be able to use up to the max conns allowed by the server but then the write client would get priority on up to say 30 if it needed (and vice versa). Thanks for the explanation of pgbouncer conn pooling - that would let all client instances share the full max allowed conns of the server, and could be an inbetween step until I can set up a read replica. I'll also tinker about with max connections and pool size and see how that helps. – Ian Ferguson May 10 '22 at 06:18
  • What does it mean for the write client to "get priority on up to say 30" connections? What happens if all connections happen to be taken up by readers, and a writer suddenly appears? Should a reader's connection somehow get broken mid-query? – Shay Rojansky May 10 '22 at 07:59