2

We are sending multiple requests to a gRPC server. But every once in a while we come across "Connection reset by peer" error with UNAVAILABLE status.

GRPC server: NestJS

Client: Python

Python version: 3.8

gRPCio version: 1.50.0

Code:

# Connect to server from client:

def connect_to_user_manager_server() -> AuthorizationControllerStub:
    channel = grpc.insecure_channel(envs.USER_MANAGER_GRPC_URL, options=(
        ('grpc.keepalive_time_ms', 120000),
        ('grpc.keepalive_permit_without_calls', True),
    ))
    stub = AuthorizationControllerStub(channel)

    return stub

client = connect_to_user_manager_server()

user_response = client.CheckAuthorization(authorizationData(authorization=token, requiredRoles=roles))
AminAli
  • 85
  • 7
  • To further debug this, you would need to figure out why the connection is being reset by the peer. One way to debug this would be to enable http and tcp traces in grpc. (GRPC_TRACE=http,tcp and GRPC_VERBOSITY=DEBUG). Looking at the logs from both the client and the server could help explain what's going on here. – Yash Tibrewal Jan 18 '23 at 19:09

1 Answers1

0

You can add a retry logic in your client code using a library such as retrying or by implementing it manually.

For instance, you can implement retry logic using the retrying library:

from retrying import retry

@retry(stop_max_attempt_number=3, wait_fixed=1000)
def connect_to_user_manager_server():
    channel = grpc.insecure_channel(envs.USER_MANAGER_GRPC_URL, options=(
        ('grpc.keepalive_time_ms', 120000),
        ('grpc.keepalive_permit_without_calls', True),
    ))
    stub = AuthorizationControllerStub(channel)
    return stub

This will retry the connect_to_user_manager_server function up to 3 times, with a 1 second delay between each retry.

You can also implement it manually using a loop and try-catch block, like this:

attempt = 1
max_attempts = 3
while attempt <= max_attempts:
    try:
        channel = grpc.insecure_channel(envs.USER_MANAGER_GRPC_URL, options=(
            ('grpc.keepalive_time_ms', 120000),
            ('grpc.keepalive_permit_without_calls', True),
        ))
        stub = AuthorizationControllerStub(channel)
        break
    except Exception as e:
        if attempt == max_attempts:
            raise e
        attempt += 1
        time.sleep(1)

This will also retry the connection to the server up to 3 times, with a 1 second delay between each retry.

You can adjust the number of retries and the delay time to fit your needs.