2

Let assume 10 devices(dev01,dev02,dev03..etc).

It send data with some interval time,we collect those data,so our data schema is

 dev01      :int
 signalname :string
 signaltime :date/time[with YY-MM-DD HHMMSS.mm]
 Extradata  :String

I want to push data into cassandra ,which way is best to store those data?

My Query is Like ,

1 Need to retrive device based current day data,or with some date range?

2 5 Device current day data?

I am not sure the following way to store data into cassadra is best model

Standard columnfamily Name:signalname
row key                   :dev01
columnname                :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue               :Json data
columnname                :timeseries(20120801124205)[YYMMDD HHMMSS][next second data]
columnvalue               :Json data

row key               :dev02
columnname            :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue           :Json data
columnname            :timeseries(20120801124205)[YYMMDD HHMMSS][next second data]
columnvalue           :Json data

Or  

Super columnfamily   :signalname
row key              :Clientid1

supercolumnname      :dev01
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data

supercolumnname      :dev02
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data


row key              :Clientid2

supercolumnname      :dev03
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data

supercolumnname      :dev04
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data

kindly help me out regarding this issue, Any other Way?

Thanks&Regards, Kannadhasan

kannadhasan
  • 339
  • 4
  • 17

1 Answers1

5

I see 3 issues with your approach here which I will address below:

  • super column families,
  • thrift vs cql3,
  • json data as cell values.

Before you go ahead: the use super column families is discouraged. Read more here. Composite keys (as described below) are the way to go.

Also, you might need to read up on CQL3, since thrift is a legacy API since 1.2.

Instead of storing json data, you may make use of native collection data types like lists, and maps etc. If you still want to work with JSON, there is improved JSON support in in Cassandra since version 2.2.

In general, it is pretty straightforward to query per device and per timeperiod:

  • you row key would be the device id and the column key a timeuuid
  • To avoid hot spots, you could add "bucket" counters to the row key (create a composite row/partition key) to rotate the nodes
  • You can then query for time ranges if you know the row/device id.

Alternatively you could use your signal type as a row key (and timeuuid/timestamp as a column key) if you want to query data for multiple devices (but one event type) at once. Read more on timeseries data in cassandra in this blog entry.

Hope that helps!

Community
  • 1
  • 1
John
  • 1,462
  • 10
  • 17
  • thanks omnibear,so you suggested to use composite columnfamily primary key as device id,timestamp followed some column name,column value – kannadhasan Aug 02 '13 at 05:36
  • while we using cql3 to read single row key having timeseries 10000 record it is taking nearly 8 min for 4 clumn ,performance is slow – kannadhasan Aug 02 '13 at 06:33
  • Could you update your question above to reflect those latest changes? For instance, could you give your CQL3 "create table" statement? Otherwise, it is hard to give a helpful answer. – John Aug 02 '13 at 07:18
  • _Composite row key_ means: two values are combined into one row/partition key, for instance "deviceid" and "bucket". `create table ..... primary key((deviceid, bucket), signaltime)`. All column keys (here: signaltime) are by default _composite column keys_ if there is more than one (this is not the case here). If oyu have a composite column key, then you can only run range queries on the last column key. The first has to be queried using an EQ operator. – John Aug 02 '13 at 07:21
  • CREATE COLUMNFAMILY deviceidcomposite(did varchar,signalid varchar,lat varchar,lan varchar,timestamp varchar,PRIMARY KEY (did,timestamp)) – kannadhasan Aug 02 '13 at 07:36
  • 1) Change the "timestamp" varchar into a timeUUID or a timestamp data type. 2) Use this column key in your query and re-check the performance. (For instance, `select * from deviceidcomposite where did = 0 and timestamp > 1375429645 and timestamp < 1375160400;`) 3) If you use timestamp/timeuuid, always use UTC+0 as a timezone to avoid problems in a distributed environment. – John Aug 02 '13 at 07:49