How can I use the time series data model in Cassandra?

Question

I'm new to Cassandra but I've seen Thrift examples earlier where I can model the columns as:

id | start_time | end_time | total_value | value + [timeStamp1]
| value + [timeStamp2]...

Is it possible to do this with a single column family with CQL? I can see that I can make a composite key of (id, timestamp) and store the values against the timestamp, and repeat the event level metadata for each row as part of denormalization, but would that still be storing it in one big row?

score 2 · Accepted Answer · answered Oct 06 '14 at 12:52

2

Yes you can do it in Cassandra with only one table. The idea is that you have a partition key (id) and a clustering key (timestamp). For the same partition key all data are written into one big row ...

CREATE TABLE timeseries (id uuid, ts timestamp, info text, otherinfo text, PRIMARY KEY (id, ts));

In this example you can query all timestamps event of a specific id by time.

 SELECT * FROM timeseries where id=someid and ts > 0 and ts < 100;

for each id you will have a wide row containing the events. As far as "repeating the event metadata as denormalization", if for the same id all other informations does not change then you should declare these as static so, doesn't matter how many events you have within a ROW these columns will be present only once (it's a smart denormalization).

HTH, Carlo

answered Oct 06 '14 at 12:52

Carlo Bertuccini

19,615
3
28
39

Won't the use of `ts` in the primary key cause problems here? Your example table seems to contain one timestamp per row, which would mean a new row for all values, but the OP seems to want a wide row, in which case the columns themselves should be the timestamps, so that range seeks work after a row ID is identified. – ely Oct 06 '14 at 12:56
Great! The static keyword is what I was missing. – samyem Oct 06 '14 at 12:56
@EMS in my example I'm using a wide row: the partition for the wide row is the id key -- it's not one timestamp per row, you can have many timestamps for the same id and will all be located under the same partition (wide row). This is because the primary key is made by a partition key and a clustering key -- give a look at http://stackoverflow.com/questions/24949676 – Carlo Bertuccini Oct 06 '14 at 13:03
I am still very confused, why do you want to your timestamp `ts` to be used for clustering? I don't see where that comes into this question. Are you saying that your `ts` will be like the OP's `start_time`? I just don't see how partitioning the data (from row to row) matters here since we are talking about storing lots of timestamps within a single row. I understand that your example could be modified to add lots of new columns to the rows as you've made them, but I still don't see why your `ts` cluster key was made to begin with? – ely Oct 06 '14 at 13:16
Or are you saying that the `ts` timestamp will chunk the wide rows into more than one wide row? So, `ts` for you might be each month, where the individual timestamps within the row are every half-hour during the month (or something)? That is, are you clustering here to deal with a potential 2000 column limit in a way that is handled implicitly by table key properties? – ely Oct 06 '14 at 13:21

How can I use the time series data model in Cassandra?

1 Answers1