Cassandra 1.1 composite keys, columns, and filtering in CQL 3

Question

I wish to have a table something as follows:

CREATE TABLE ProductFamilies (
  ID varchar,
  PriceLow int,
  PriceHigh int,
  MassLow int,
  MassHigh int,
  MnfGeo int,
  MnfID bigint,
  Data varchar,
  PRIMARY KEY (ID)
);

There are 13 fields in total. Most of these represent buckets. Data is a JSON of product family IDs, which are then used in a subsequent query. Given how Cassandra works, the column names under the hood will be the values. I wish to filter these.

I wish to run queries as follows:

SELECT Data FROM MyApp.ProductFamilies WHERE ID IN (?, ?, ?) AND PriceLow >= ? 
AND PriceHigh <= ? AND MassLow >= ? AND MassHigh <= ? and MnfGeo >= ? AND 
MnfGeo <= ?

I read that Cassandra can only execute WHERE predicates against composite row keys or indexed columns. Is this still true? If so, I would have to make the columns < Data part of the PK.
Is it still the case that one has to include all columns from left to right and cannot skip any?
Are there any non-optimum points in my design?
I would like to add a column "Materials", which is an array of possible materials in a product family. Think pizza toppings, and querying "WHERE Materials IN ('Pineapple')". Without creating a separate inverted index of materials and then performing a manual intersection against the above query, is there any other [more elegant] way of handling this in Cassandra?

Once again this is my answer and it still holds true :D http://stackoverflow.com/questions/11825783/cassandra-and-querying-hierarchies/11828376#11828376 — Tamil, Sep 06 '12 at 11:08
Yes... thank you, but what about the point of only being able to filter against the PK? I know your answer is true ;) — IamIC, Sep 06 '12 at 13:49
"Model your data as per your read pattern" is the rule of thumb in cassandra. Only reason for the above is you can't do joins, you can't filter on a column value until you index one :) So, Mention your full usecase and read pattern for us to help :P — Tamil, Sep 06 '12 at 14:07
I thought I did mention my use case :P... let me expand the question. — IamIC, Sep 06 '12 at 14:08
My data is modeled as per my read pattern. Based on your prev. answer, I believe I have this correct. As I understand it, CQL 3 is going to store the values as column names, excluding the final data value. — IamIC, Sep 06 '12 at 14:11
http://itsallabtamil.blogspot.in/2012/10/cassandra-compositetype-overview-cql.html Check this out ;) You are the right person to comment on it :) — Tamil, Oct 02 '12 at 08:58

score 3 · Answer 1 · answered Sep 06 '12 at 14:10

3

If you specify the exact PK you are looking up, as you propose here (id IN ...), you can use whatever expressions you like in the remaining predicates. There are no restrictions.

List collections are supported starting in 1.2.0, which is scheduled for release at the end of October. Indexed querying of collection contents may or may not be supported.

answered Sep 06 '12 at 14:10

jbellis

19,347
2
38
47

I saw an email from you indicating #1 (WHERE must be against PK components). So this has def. changed? – IamIC Sep 06 '12 at 14:13
Will list collections support push and pop? – IamIC Sep 06 '12 at 14:14
@jbellis it would be great if you could validate my answer :) – Tamil Sep 06 '12 at 17:14
And, does this means that the WHERE clause can be used in schmeless designs so long as the PK is specified (actually, I don't get the connection between PK and can-use-WHERE). – IamIC Sep 06 '12 at 21:00
@jbellis Please see this question: http://stackoverflow.com/questions/12609008/unable-to-perform-range-queries-in-cassandra-1-2 – IamIC Sep 26 '12 at 19:59

Tamil · Accepted Answer · 2012-09-27T06:07:44.183

Basically to support you queries you need to have

create column family ProductFamilies with 
comparator='CompositeType(UTF8Type, Int32Type, Int32Type, Int32Type, Int32Type, Int32Type, LongType, UTF8Type)' 
and key_validation_class='UTF8Type'

or

CREATE TABLE ProductFamilies (
  ID varchar,
  PriceLow int,
  PriceHigh int,
  MassLow int,
  MassHigh int,
  MnfGeo int,
  MnfID bigint,
  Data varchar,
  PRIMARY KEY (ID, PriceLow, PriceHigh, MassLow, MnfGeo, MnfID, Data)
);

~~Now you can query~~

SELECT Data FROM MyApp.ProductFamilies WHERE ID IN (?, ?, ?) AND PriceLow >= ? 
AND PriceHigh <= ? AND MassLow >= ? AND MassHigh <= ? and MnfGeo >= ? AND 
MnfGeo <= ?

Provided you don't miss any column from left to right [although not a filter but atleast a *] and all your values are in the column names rather the value.

One more thing you should understand about composite columns is "Column Slice must be contiguous" So, pricelow > =10 and pricelow <= 40 will return you a contiguous slice but filtering the result set with masslow and other columns will not work as it is not going to result in a contiguous slice. BTW pricelow = 10 and masslow <= 20 and masslow >=10 should work [tested with phpcassa] as it will result in a contiguous slice once again.

Else create a or multiple secondary index on any of the column of yours. Then you have the rights to query based on column values provided you always have atleast one of the indexed field in query. http://www.datastax.com/docs/1.1/ddl/indexes

Regarding you material question there is no other go than having an inverted index if it is going to be a multivalued column as of I know.

It would be great if @jbellis verifies this

Why did you write "PRIMARY KEY (ID, PriceLow, PriceHigh, MassLow, MnfGeo, MnfID, Data)"? At the least, surely Data would be the final value? @jbellis appears to be indicating that one no longer needs to include all queried columns in the PK. — IamIC, Sep 06 '12 at 20:45
@IanC I hope if I'm understanding the tweets example of composite column correctly http://www.datastax.com/docs/1.1/ddl/column_family#composite-columns then what I have mentioned should work for u — Tamil, Sep 07 '12 at 02:30
It will work, yes. I just think the new version has relaxed some of the earlier requirements. — IamIC, Sep 07 '12 at 13:47
I finally tried this. Please see this question: http://stackoverflow.com/questions/12609008/unable-to-perform-range-queries-in-cassandra-1-2 — IamIC, Sep 26 '12 at 20:04

Cassandra 1.1 composite keys, columns, and filtering in CQL 3

2 Answers2

Linked