0

I am struggling with data modeling in cassandra where i have different attributes for different organizations. As there would be any number of attributes i am unable to model a dynamic number of columns in schema. Secondly, when i use map for this, i am unable to query against those attributes or index them etc. Am i missing something or this is a limitation in cassandra?


Scenario

one organization selects specific attributes to collect data for and they can change those attributes anytime. When they change, number of attributes and name of attributes changes. If previously we were collecting data for att1,attr2,attr3, now we are collecting attr4,attr5,attr6,attr7,attr8,attr9. And this can be changed at anytime for any organization. Furthermore, organization will be searching massively on those attributes.

  1. How can we model such scenario in cassandra.
  2. if it's a limitation, what could be the alternatives of cassandra where we have read/write (mostly write and often read. Not update/delete) proficiency.
  3. Do we have to combine any other framework with cassandra? like lucene etc

Thanks in advance.

Community
  • 1
  • 1
vicky
  • 561
  • 1
  • 7
  • 17
  • Please check this article by datastax : https://www.datastax.com/blog/2013/06/does-cql-support-dynamic-columns-wide-rows Hope this helps – Pharaoh Apr 01 '20 at 10:15

1 Answers1

0

This case really requires more information about queries that are executed, etc.

in simplest case, just put the attribute name as a clustering column in addition to existing, like this:

create table tbl (
  id int,
  collected timestamp,
  attr_name text,
  attr_value int,
  primary key(id, collected, attr_name);

in this case you can select either individual attribute if you do

select * from tbl where id = ... and collected = ... and attr_name = 'attrX';

or you can select all attributes by just omitting the attr_name:

select * from tbl where id = ... and collected = ...;

but it will work only when all attribute values have the same data type. If they could be different, then you may need to add more fields for every data type.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thanks for your response. Now each row to show user = all attributes collected at that time (collected column you used above). Then how would i join them? group by? or UDF? – vicky Apr 02 '20 at 02:24
  • but i have to group them in one row exactly. How will i then group so that i can get 1-id 2-collected 3-attr1:value,attr2:value:attr3:value......... all this as a one row – vicky Apr 02 '20 at 12:12
  • to show you don't need to have them as one row - just iterate through all results & combine into the data structure that you're using for displaying - you're not displaying the Row itself I think, you need to convert it into some data structure – Alex Ott Apr 02 '20 at 12:32
  • that's on the server side but i wanted to group them at query side. i am not familiar with it's UDF, can you tell me if that can somehow do it? otherwise i will do it over server side. – vicky Apr 02 '20 at 14:00
  • It's better not to use UDFs - they could lead to performance problems, etc. Also, even UDF won't solve your problem - the best it can do is to collect data into a map, so you will get back a map of names to values, but it's not the same as row – Alex Ott Apr 02 '20 at 14:14
  • then the best solution is to do it at server side. Thanks – vicky Apr 02 '20 at 16:23