Cosmos DB Cassandra API Indexes that span partitions

Question

We are in the process of moving our application from on prem to Azure. We are currently using Cassandra and the plan is to use Cosmos DB Cassandra API in Azure. In Cassandra, the general rule of thumb is that an index should correspond to single partition, otherwise it is better to use Materialized Views or secondary tables.

Does the same hold true for Cosmos DB? If I have a query that would return ~20 rows of data that come from 20 different partitions, can I accomplish this by using an index (without incurring significant performance or cost penalties), or should I create a secondary table?

As an aside, I am aware that Cosmos DB Cassandra API has recently introduced Materialized Views, but since this feature is still in Preview, we are not going to use it.

score 1 · Accepted Answer · answered May 08 '23 at 20:02

1

This rule of thumb generally holds for any distributed database (i.e. one that supports transparent sharding/partitioning), including Azure Cosmos DB. With that said, cross partition queries are not necessarily a disaster if they are not frequent, and the latency is tolerable for the user.

By the way, if you are planning a migration from on prem, it is worth considering Azure Managed Instance for Apache Cassandra. This is a managed hosting service for pure open-source Apache Cassandra, built by the Azure Cosmos DB team. Most notably, it supports hybrid clusters, meaning that you can deploy a Cassandra data center with this service in Azure, but have it join your existing on prem Cassandra ring (as long as you have the required networking in place, and are running open-source Apache Cassandra v3.11 or higher). This will make zero-downtime migration to Azure cloud much more straightforward.

answered May 08 '23 at 20:02

Theo van Kraay

114
3

Thanks for the response, I understand that cross partition queries is an antipattern for all distributed databases. That was not my question. In Cassandra even creating cross partition indices is an antipattern. To make cross partition queries work in Cassandra, I would need to use secondary tables or Materialized Views. My question is it ok to create cross partition indices in Cosmos DB Cassandra API? Would such indices be performant for high volume reads? Unfortunately the decision to use Cosmos DB Cassandra API vs Cassandra Managed Instance is out of my hands. – Eugene May 08 '23 at 20:44
1

If by "cross partition indices" you mean creating an index on a field that is not the partition key, where the intention is to execution queries that filter on that field, but don't provide the partition key in the filter, then yes, those queries in Cosmos DB will incur similar issues as they would in Cassandra. With that said, in general reads are more efficient in Cosmos DB than Cassandra, because there are no read repairs, tombstones do not need to be skipped over, and SSTables do not need to be compacted. So even in your use case, performance may be better in Cosmos DB. – Theo van Kraay May 09 '23 at 11:34

Cosmos DB Cassandra API Indexes that span partitions

1 Answers1