4

I am facing this complex challenge with an RDS PostgreSQL instance. I am almost out of any idea how to handle it. I am launching an app (React+Go+PostreSQL) for which I expect to have around 250-300 users simultaneously making the same API GET request for how long the users wish to use it.

It is a questionnaire kind of app and users will be able to retrieve one question from the database and answer it, the server will save the answer in the DB, and then the user will be able to press next to fetch the next question. I tested my API endpoint with k6 using 500 virtual users for 2 minutes and the database returns dial: i/o timeout or even connection rejected sometimes, usually when it reaches 6000 requests and I get around 93% success. I tried to fine-tune the rds instance with tcp_keep_alive parameters but without any luck, I still cannot manage to get 100% of the request pass. I also tried to increase the general storage from 20gb min to 100gb in rds and switch from the free db.t3.micro to db.t3.medium size.

Any hint would be much appreciated. It should be possible for a normal golang server with postgres to handle this requests at the same time, shouldn't it? It is just a regular select * from x where y statement.

EDIT (CODE SAMPLE): I use a dependency injection pattern and so I have only one instance of the DB passed to all the other repositories including the API package. The db repo looks like this:

func NewRepository() (DBRepository, error) {
  dbname := getenv("POSTGRES_DATABASE", "")
  username := getenv("POSTGRES_ROOT_USERNAME", "")
  password := getenv("POSTGRES_ROOT_PASSWORD", "")
  host := getenv("POSTGRES_HOST", "")
  port := getenv("POSTGRES_PORT", "")

  dsn := fmt.Sprintf("host=%s user=%s password=%s"+
    " dbname=%s port=%s sslmode=disable 
    TimeZone=Europe/Bucharest", host, username, password, dbname, 
    port)
  db, err := gorm.Open(postgres.Open(dsn), &gorm.Config{})
  if err != nil {
    return nil, err
  } 

  if err != nil {
    return nil, err
  }

  db.AutoMigrate(
   //migrate tables are here
  )

  return &dbRepository{
    db: db,
  }, nil

}

Currently the parameters use in RDS for TCP keepalive are:

tcp_keepalives_count 30

tcp_keepalives_idle 1000

tcp_keepalives_interval 1000 and I also tried with different numbers.

The query I am doing is a simple .Find() statement from gorm package but it seems like this is not the issue since it gets blocked whenever hits the first query/connection with the db. There are 2 query executed in this endpoint I am testing but it gets stuck on the first. If more info is needed I will update but this issue it gets so frustrating.

My k6 test if the following:

import http from 'k6/http';
import { check } from 'k6';
import { sleep } from 'k6';

export const options = {
  insecureSkipTLSVerify: true,
  stages: [
    { target: 225, duration: '2m' },
  ],
};

const access_tokens = []

let random_token = access_tokens[Math.floor(Math.random()*access_tokens.length)];

const params = {
    headers: {'Authorization': `Bearer ${random_token}`}
};

export default function () {
  let res =  http.get('endpoint here', params);
  check(res, {'Message': (r)=> r.status === 202});  
  sleep(1);
}

The DB tables are also indexed and tested with the explain statement.

Albert Berty
  • 85
  • 1
  • 8
  • Check the [logs](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.PostgreSQL.html) and monitor the number of open connections as the test runs ([here](https://stackoverflow.com/q/27435839/11810946) and perhaps [limit this](https://pkg.go.dev/database/sql#DB.SetMaxOpenConns)). My guess would be that the issue is in your code (for example, not [closing](https://pkg.go.dev/database/sql#Rows.Close) result sets). For a real answer provide a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Brits Jul 26 '22 at 22:06
  • It needs to support 300, and it craps out at 500. What is the problem? This sounds like victory. – jjanes Jul 27 '22 at 02:15
  • Hi @Brits, I updated the post and I hope it includes enough information now. Looking forward to hearing from you! – Albert Berty Jul 27 '22 at 08:15
  • "it gets blocked whenever hits the first query/connection with the db" - please expand on this (I thought your API worked OK until under heavy load?). Please try to produce a working example (connect to database, Listen for HTTP connections and run your handler - should be petty short). It may also be worth halving the target and doubling the duration (to confirm if it is the load or related to the number of queries run). Sorry - can't really say any more with the info available (it's possible you are hitting a database limit we have insufficient info to comment). – Brits Jul 27 '22 at 09:06

0 Answers0