How to fix problem "Unable to complete the operation against any hosts" in Cassandra?

Question

I have a pretty simple AWS Lambda function in which I connect to an Amazon Keyspaces for Cassandra database. This code in Python works, but from time to time I get the error. How do I fix this strange behavior? I have an assumption that you need to make additional settings when initializing the cluster. For example, set_max_connections_per_host. I would appreciate any help.

ERROR:

('Unable to complete the operation against any hosts', {<Host: X.XXX.XX.XXX:XXXX eu-central-1>: ConnectionShutdown('Connection to X.XXX.XX.XXX:XXXX was closed')})

lambda_function.py:

import sessions


cassandra_db_session = None
cassandra_db_username = 'your-username'
cassandra_db_password = 'your-password'
cassandra_db_endpoints = ['your-endpoint']
cassandra_db_port = 9142


def lambda_handler(event, context):
    global cassandra_db_session
    if not cassandra_db_session:
        cassandra_db_session = sessions.create_cassandra_session(
            cassandra_db_username,
            cassandra_db_password,
            cassandra_db_endpoints,
            cassandra_db_port
        )
    result = cassandra_db_session.execute('select * from "your-keyspace"."your-table";')
    return 'ok'

sessions.py:

from ssl import SSLContext
from ssl import CERT_REQUIRED
from ssl import PROTOCOL_TLSv1_2
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.policies import DCAwareRoundRobinPolicy


def create_cassandra_session(db_username, db_password, db_endpoints, db_port):
    ssl_context = SSLContext(PROTOCOL_TLSv1_2)
    ssl_context.load_verify_locations('your-path/AmazonRootCA1.pem')
    ssl_context.verify_mode = CERT_REQUIRED
    auth_provider = PlainTextAuthProvider(username=db_username, password=db_password)
    cluster = Cluster(
        db_endpoints,
        ssl_context=ssl_context,
        auth_provider=auth_provider,
        port=db_port,
        load_balancing_policy=DCAwareRoundRobinPolicy(local_dc='eu-central-1'),
        protocol_version=4,
        connect_timeout=60
    )
    session = cluster.connect()
    return session

score 2 · Answer 1 · edited Oct 15 '20 at 13:34

2

There isn't much point setting the max connections on the client side since AWS Lambdas are effectively "dead" between runs. For the same reason, the recommendation is to disable driver heartbeats (with idle_heartbeat_interval = 0) since there is no activity that occurs until the next time the function is called.

This doesn't necessarily cause the issue you are seeing but there's a good chance the connection is being reused by the driver after it has been closed server-side.

With the lack of public documentation on the inner-workings of AWS Keyspaces, it's difficult to know what is happening on the cluster. I've always suspected that AWS Keyspaces has a CQL-like API engine in front of a Dynamo DB so there are quirks like what you're seeing that are hard to track down since it requires knowledge only available internally at AWS.

FWIW the DataStax drivers aren't tested against AWS Keyspaces.

edited Oct 15 '20 at 13:34

Dharman

30,962
25
85
135

answered Oct 15 '20 at 03:59

Erick Ramirez

13,964
1
18
23

Thank you very much for your recommendations. Because it takes a very long time to connect to the database each time, I defined the connection to the database outside of the `lambda_handler` function. The connection to the database will be created the first time the function is called. Any subsequent function call will use the same database connection. If the AWS Lambda function is not utilized for 15 minutes, the connection disappears. How correct is this in your opinion? – Nurzhan Nogerbek Oct 15 '20 at 04:23
I will try adding `idle_heartbeat_interval` during cluster initialization and test it today during the day. I'll let you know the results. – Nurzhan Nogerbek Oct 15 '20 at 04:23
Again, it's hard to know how AWS Keyspaces handles the connections so it's hard to comment. Cheers! – Erick Ramirez Oct 15 '20 at 04:48

Aaron · Answer 2 · 2020-10-14T19:26:55.350

1

This is the biggest issue which I see:

result = cassandra_db_session.execute('select * from "your-keyspace"."your-table";')

The code looks fine, but I don't see a WHERE clause. So if there's a lot of data, a single node (chosen as a coordinator) will have to build the result set while pulling data from all other nodes. As this results in (un)predictibly bad performance, that could explain why it works sometimes, but not others.

Pro-tip: All queries in Cassandra should have a WHERE clause.

edited Oct 14 '20 at 19:26

answered Oct 14 '20 at 17:43

Aaron

55,518
11
116
132

Hello! In this post, I forgot to specify it, but in reality, I use the `WHERE` clause. What else can cause the previously mentioned error? I think I need to add additional settings to the cluster, but I do not know which ones. What do you think about this? – Nurzhan Nogerbek Oct 15 '20 at 03:33

How to fix problem "Unable to complete the operation against any hosts" in Cassandra?

2 Answers2