1

I have default configuration cassandra on default installation docker on windows 11. Table data contains 19 rows.

The python driver is exceptionally slow and crashes in about 20% of cases. (Connection Timeout)

I first expected this has something to do with docker or the container configuration, but I noticed that RazorSQL has no issues and therefore I did some performance testing by comparing the official datastax python driver to the official datastax .NET driver.

The results are devastating:

  • Python: 22.908 seconds (!)
  • .NET: 0.168 seconds

Is this normal behavior of the python driver?

My python code:

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import time
start = time.time()
for i in range(10):
    auth_provider = PlainTextAuthProvider(username="cassandra", password="cassandra")
    cluster=Cluster(["localhost"], auth_provider=auth_provider,connect_timeout=30)
    session=cluster.connect("rds")
    session.execute("SELECT COUNT(*) FROM data").one()
end = time.time()
print((end - start)/10)

My C# code:

using Cassandra;
using System;
using System.Diagnostics;
public void TestReliability()
{
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();
    for (int i = 0; i < 100; i++){Test();}
    stopwatch.Stop();
    Console.WriteLine("Average connect + one query in ms: " + (stopwatch.ElapsedMilliseconds / 100));
}
public void Test()
{
    Cluster cluster = Cluster.Builder().AddContactPoint("localhost").WithAuthProvider(new PlainTextAuthProvider("cassandra", "cassandra")).Build();
    ISession session = cluster.Connect("rds");
    var result=session.Execute("SELECT COUNT(*) FROM data");
    session.Dispose();
    cluster.Dispose();
}

EDIT: The python driver does not crash when timeout is set high enough (35 seconds(!))

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
SalkinD
  • 753
  • 9
  • 23
  • Are you sure you are getting connected? Is this running in a container attempting to reach a database on the host itself? – JonSG Apr 04 '23 at 16:30
  • C# getsfast connection on same machine everytime. Python has exceptional high connection times. – SalkinD Apr 04 '23 at 17:07

2 Answers2

2

In Cassandra applications Cluster and Session should be singletons as those are stateful (handling load balancing, failover) and it is expensive to open connections.

Here you are opening connection over and over again in a loop. Move those 3 lines outside of the loop and should get back on your feet.

auth_provider = PlainTextAuthProvider(username="cassandra", password="cassandra")
cluster=Cluster(["localhost"], auth_provider=auth_provider,connect_timeout=30)
session=cluster.connect("rds")
clunven
  • 1,360
  • 6
  • 13
  • This is just for performance testing, providing an average connection time. Same happens with normal single use. (And, assuming correct implementation, this code should not alter per-connection performance) – SalkinD Apr 04 '23 at 17:08
1

Your test looks invalid to me (more on this later). You're breaking the usage guidelines, mainly (1) use a single cluster object, and (2) use a single session object for the life of the application because (3) maintaining multiple instances are expensive.

But specifically on the sample code you posted, you are not comparing apples-for-apples.

In your C# code, you are making explicit calls to dispose():

    session.Dispose();
    cluster.Dispose();

which close all connections and perform a cleanup of resources. However you are not doing the same thing in your Python code which means that the older connections (and associated resources) are still maintained by the app in the background.

To make your two sample codes more comparable, you should call Session.shutdown() and Cluster.shutdown(). For more info, see the cassandra.cluster API Doc for the Python driver.

In any case, your test isn't valid because it isn't how applications behave in real life. If you tell us what problem you're trying to solve or what you're trying to achieve, we would be able to provide a better answer.

If you are interested, I recommend having a look at Best practices for Cassandra drivers. Cheers!

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
  • The loop is just for testing to get a average value. As I correctly closed the connection I think this is a valid approach to get average connecting time. I managed to resolves the issue by updating the driver to the new version published last week. As this problem was a driver issue I will delete my query tomorrow. Thanks for your help! – SalkinD Apr 05 '23 at 18:14