2

Following are the two queries that I need to perform.

select * from where dept = 100 and emp_id = 1;

select * from where dept = 100 and name = 'One';

Which of the below options is better ?

Option 1: Use secondary index along with a partition key. I assume this way query will be executed faster as there is no need to go different nodes and index needs to be searched only locally.

cqlsh:d2> desc table emp_by_dept;

CREATE TABLE d2.emp_by_dept (
    dept int,
    emp_id int,
    name text,
    PRIMARY KEY (dept, emp_id)
) WITH CLUSTERING ORDER BY (emp_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX emp_by_dept_name_idx ON d2.emp_by_dept (name);

cqlsh:d2> select * from emp_by_dept where dept = 100;

 dept | emp_id | name
------+--------+------
  100 |      1 |  One
  100 |      2 |  Two
  100 |     10 |  Ten

(3 rows)



 activity                                                                                        | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                              Execute CQL3 query | 2015-06-15 17:36:55.860000 | 10.0.2.16 |              0
                       Parsing select * from emp_by_dept where dept = 100; [SharedPool-Worker-1] | 2015-06-15 17:36:55.861000 | 10.0.2.16 |            202
                                                       Preparing statement [SharedPool-Worker-1] | 2015-06-15 17:36:55.861000 | 10.0.2.16 |            418
                           Executing single-partition query on emp_by_dept [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 |          10525
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 |          10564
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 |          10635
                                               Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 |          10748
                               Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 |          10757
 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 |          18141
                                Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 |          18166
                                        Read 3 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 |          18335
                                                                                Request complete | 2015-06-15 17:36:55.928174 | 10.0.2.16 |          68174





cqlsh:d2> select * from emp_by_dept where dept = 100 and name = 'One';

 dept | emp_id | name
------+--------+------
  100 |      1 |  One

(1 rows)

Tracing session: c56e70a0-1357-11e5-ab8b-fb5400f1b4af

 activity                                                                                                                                                                                                                                                                                                                | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                                                                                                                                                                                                                                                      Execute CQL3 query | 2015-06-15 17:42:20.010000 | 10.0.2.16 |              0
                                                                                                                                                                                                                              Parsing select * from emp_by_dept where dept = 100 and name = 'One'; [SharedPool-Worker-1] | 2015-06-15 17:42:20.010000 | 10.0.2.16 |             12
                                                                                                                                                                                                                                                                               Preparing statement [SharedPool-Worker-1] | 2015-06-15 17:42:20.010000 | 10.0.2.16 |             19
                                                                                                                                                                                                                                                                         Computing ranges to query [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 |            881
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 |           1144
                                                                                                                                                                                               Submitting range requests on 1 ranges with a concurrency of 1 (0.003515625 rows per range expected) [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 |           1238
                                                                                                                                                                                                                                                             Executing indexed scan for [100, 100] [SharedPool-Worker-2] | 2015-06-15 17:42:20.011000 | 10.0.2.16 |           1703
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           1827
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           1929
                                                                                                                                                                                                                              Executing single-partition query on emp_by_dept.emp_by_dept_name_idx [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           2058
                                                                                                                                                                                                                                                                      Acquiring sstable references [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           2087
                                                                                                                                                                                                                                                                       Merging memtable tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           2173
                                                                                                                                                                                                                                                                       Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           2352
                                                                                                                                                                                                                                                 Seeking to partition indexed section in data file [SharedPool-Worker-2] | 2015-06-15 17:42:20.012001 | 10.0.2.16 |           2377
                                                                                                                                                                                                                         Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.014000 | 10.0.2.16 |           4300
                                                                                                                                                                                                                                                        Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-06-15 17:42:20.014000 | 10.0.2.16 |           4322
                                                                                                                                                                                                                                           Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-06-15 17:42:20.031000 | 10.0.2.16 |          21798
                                                                                                                                                                                                                                                                Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          21989
                                                                                                                                                                                                                                                   Executing single-partition query on emp_by_dept [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          22374
                                                                                                                                                                                                                                                                      Acquiring sstable references [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          22385
                                                                                                                                                                                                                                                                       Merging memtable tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          22433
                                                                                                                                                                                                                                                                       Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          22514
                                                                                                                                                                                                                                                 Seeking to partition indexed section in data file [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          22523
                                                                                                                                                                                                                         Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 |          22963
                                                                                                                                                                                                                                                        Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 |          22972
                                                                                                                                                                                                                                                                Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 |          22991
                                                                                                                                                                                                                                                                      Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 |          23096
                                                                                                                                                                                                                                                                                                        Request complete | 2015-06-15 17:42:20.033227 | 10.0.2.16 |          23227

Option 2: Create 2 tables as below.

   CREATE TABLE d2.emp_by_dept (
        dept int,
        emp_id int,
        name text,
        PRIMARY KEY (dept, emp_id)
    ) WITH CLUSTERING ORDER BY (emp_id ASC);

select * from emp_by_dept where dept = 100 and emp_id = 1;

    CREATE TABLE d2.emp_by_dept_name (
        dept int,
        emp_id int,
        name text,
        PRIMARY KEY (dept, name)
    ) WITH CLUSTERING ORDER BY (name ASC);

select * from emp_by_dept_name where dept = 100 and name = 'One';
Community
  • 1
  • 1

4 Answers4

2

Normally it is a good approach to use secondary indexes together with the partition key, because - as you say - the secondary key lookup can be performed on a single machine.

The other concept that needs to be taken into account is the cardinality of the secondary index. In your case emp_id is probably unique, and name is almost unique, so the index will most probably return a single row, and therefore it is not too efficient. For a good explanation I recommend this article: http://www.wentnet.com/blog/?p=77.

As consequence, if query time is critical and you can update both tables in the same time, I recommend using your option 2.

It would also be interesting to measure the two options with some generated data.

medvekoma
  • 1,179
  • 7
  • 11
1

Option one won't be possible, as Cassandra does not support queries using both primary keys and secondary keys. Your best bet, would be to go with option two.

Although the similarities are many, don't think of it as a 'relational table'. Instead think of it as a nested, sorted map data structure. Cassandra believes in de-normalization and duplication of data for better read performance. Therefore, option 2 is completely normal and within the best practices of Cassandra.

Few links which you might find useful - http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

How do secondary indexes work in Cassandra?

Hope this helps.

Community
  • 1
  • 1
booleanhunter
  • 5,780
  • 4
  • 16
  • 21
0

Since maintaining two tables is harder than maintaining a single, the first option would be more preferable.

Simha
  • 149
  • 3
  • 9
0

Query1 = select * from <> where dept = 100 and emp_id = 1;

Query2 = select * from <> where dept = 100 and name = 'One';

Option 1:

Write : time to write to emp_by_dept + time to update index

Read : Query1 will be a direct read from emp_by_dept, Query2 will be a read from emp_by_dept + get the location from index table + read the value from emp_by_dept

Option 2:

Write : time to write to emp_by_dept + time to write to emp_by_dept_name

Read: Query1 will be a direct read from emp_by_dept, Query2 will be a direct read from emp_by_dept_name (the required data is already sorted and kept )

So I assume write time should be almost the same in both cases (I have not tested this)

If your read response time is more important, then go for Option2.

If you are worried about maintaining 2 tables, go for option 1.

Thanks everyone for your inputs.