2

I have an instance method that is used to construct a Cassandra query, and I'm looking for a way to more safely parametrize the table name in a query without using string formatting.

For example,

def some_method(self, table_name):
    statement = f"SELECT COUNT(*) FROM {self.table_name}"
    results = self.cassandra.session.execute(query=statement)
    ...

I don't believe SimpleStatement or a PreparedStatement can be used to bind a table name.

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
Wes Doyle
  • 2,199
  • 3
  • 17
  • 32
  • 1
    If no safe methods can be found (glancing at the source it does appear that the bound values are to columns), perhaps querying for a list of table names that already exist and check that `self.table_name` is found in any of them, which should _hopefully_ be safe enough (subject to table names not named in funny ways). – metatoaster Jan 23 '19 at 23:32
  • Did you check this? [How to pass the table name in parameter in cassandra python](https://stackoverflow.com/questions/54292761/how-to-pass-the-table-name-in-parameter-in-cassandra-python) – Horia Jan 24 '19 at 04:59
  • Possible duplicate of [How to pass the table name in parameter in cassandra python](https://stackoverflow.com/questions/54292761/how-to-pass-the-table-name-in-parameter-in-cassandra-python) – Alex Ott Jan 24 '19 at 08:28
  • 1
    I’m specifically looking for a safe way to do this without using string formatting. – Wes Doyle Jan 24 '19 at 13:37
  • 1
    I'm also looking for a way to do this that's both safe and satisfies linters like flake8 and bandit warning about potential SQL injections due to the string formatting, even though the variable value is never coming from user/external input at all (ex. from a dict mapping) – Gino Mempin May 03 '23 at 05:06

1 Answers1

1

If you are using DataStax's Python Driver for Cassandra (as of v3.28), there is no way to parametrize or bind the table name using Session.execute / Session.execute_async, because it only expects the column values to be in the parameters argument, and the table name to be hardcoded in the statement string. You'll have to resort to string formatting, which then makes your linters complain about possible SQL injections (which you then have to silence with # noqa and such).

I managed to work around this by using cassandra.cqlengine.models instead which is still part of the same Python driver package: https://docs.datastax.com/en/developer/python-driver/3.28/api/cassandra/cqlengine/models/.

You can dynamically create the model for your table during runtime (during which you can pass the table names from somewhere), and then use that model to perform the queries.

First, define an abstract model:

from cassandra.cqlengine import columns
from cassandra.cqlengine.models import Model
from cassandra.cqlengine.management import sync_table

class MyAbstractModel(Model):
    __abstract__ = True

    __keyspace__ = "your_keyspace"

    # The table name will be defined during runtime
    # __table_name__ = ...

    id = columns.Integer(primary_key=True)
    first_name = columns.Text()
    last_name = columns.Text()

Then on runtime, create a concrete model:

def get_table_name() -> str:
    # Return some table name based on some app-specific logic.
    return "Person"

table_name = get_table_name()
table_attrs = {"__table_name__": table_name}
person_model = type(
    table_name,
    (MyAbstractModel,),
    table_attrs,
)

# Not shown here is setting-up the connection to the cluster
# and setting-up a session, as that's not part of the question.
# Assume an active session is already available.
sync_table(model=person_model)

The type call is Python's built-in type function for creating a type, in this case a concrete Person class based on cassandra.cqlengine.models.Model. The arguments to type in order are:

If your code runs successfully, you'll see that the table is created:

cqlsh> DESC TABLE your_keyspace.person;

CREATE TABLE your_keyspace.person (
    id int PRIMARY KEY,
    first_name text,
    last_name text
) WITH ...

cqlsh> SELECT COUNT(*) FROM your_keyspace.person;

 count
-------
     0

Once you've got your table, you can now use that to make your queries as a replacement to using session.execute. Depending on how many tables you'll need to create, you'll probably need a mapping of the table names to their models:

all_models = {
    "person": person_model,
    "food": food_model,
    "drink": drink_model,
    ...
}

Copying the example from the question where the table name is a parameter of some_method:

def some_method(self, table_name):
    model = all_models.get(table_name)
    if not model:
      raise InvalidTableNameException

    row = model(id=1234, first_name="Foo", last_name="Bar")
    row.save()

Since the models are dynamically created, just make sure that

  • It is still a valid Python class name
  • It is still a valid Cassandra table name
  • The __table_name__ and the type name are consistent

Read the documentation on Making Queries with models.

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135