4

I am learning cassandra with Python and specially with Django 2 using cqlengine. I am trying to search in database where I search for string that starts with the search parameter but I want to make it case insensitive. So if I have following data

-------------------------------
|    PKID    |     String     |
-------------------------------
|    1234    |     FOObar     |
|    4321    |     FoOBar     |
|    5665    |     IreALLy    |
|    5995    |     DontknoW   |
|    8765    |     WHatTOdo   |
|    4327    |     foobaR     |
-------------------------------

So if I want to search for string that starts with foo, I should get all three records. I searched for the solution and I found one comment on stackoverflow that everything is byte in cassandra and so it is not possible but I also found something that says I need to write custom function to do it.

For Django I am using django-cassandra-engine to create model. It is an implementation of cqlengine for django. So when I create my model, is there anything that I need to add in it? My test model is

class TestModel(DjangoCassandraModel):
    key_id = columns.UUID(primary_key=True, default=uuid.uuid4)
    string = columns.Text()

I looked for it in cqlengine docs but couldn't find anything helpful. So I am seeking for help here on stackoverflow.

Thanks.

Viral Joshi
  • 317
  • 2
  • 15

1 Answers1

5

You can use so-called SASI-index (SSTable Attached Secondary Index) to do it in Cassandra itself (although it's marked as experimental feature). You can define indexes for doing prefix, range, or substring search, and when configuring index, you can specify that you want case-insensitive strings:

CREATE CUSTOM INDEX index_name ON table (field) 
  USING 'org.apache.cassandra.index.sasi.SASIIndex'
  WITH OPTIONS = { 
    'mode': 'PREFIX', // if you want to search only for 'starting with'
    'case_sensitive': 'false'
};

Although I'm not sure, will cqlengine pickup this functionality out of the box, or not.

Here is very detailed blog post about SASI-indexes.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thanks @Alex. I will read the blog post and try to implement it in my database and see if I can get it working. – Viral Joshi May 20 '18 at 13:23
  • Im using this : CREATE CUSTOM INDEX ON device_by_username ( comm_nr ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzed' : 'true', 'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive' : 'false', 'mode' : 'CONTAINS' }; which seems to work so far – Alex Tbk May 22 '18 at 06:23