I read somewhere indicating that for a row having thousands of columns in a standard column family, better design to split them into super columns, and by doing so, the read would be very efficient as cassandra will only need to load and return the columns under a given super column name, instead of load and possibly return the thoudsands of columns. Can anyone please confirm?
2 Answers
That's not good advice. At this point, there are very small number of use cases for which super columns are the best solution. The new CompositeTypes are a better solution for most of what super columns have been used for historically.
With that said, it sounds like you don't need CompositeTypes here either. It's true that if you're reading a very large row, you shouldn't pull back the entire row at once. Instead, you should fetch portions of the row in contiguous slices.
Basically, you'll be performing a series of get_slice()
s. For the first one, set the column count to, say, 1000, and the column start to "". Then, take the last column name from that set of results (call it X), and make another get_slice()
call with a column count of 1000, but this time, set the column start to X. Discard the first column you get back (it will be X), and then repeat the whole get_slice()
process until the query returns less than 1000 columns, which signals that you've hit the end of the row.
You may want to fetch more than or less than 1000 at a time, depending on your column size.

- 19,179
- 10
- 84
- 156

- 6,872
- 24
- 31
-
Note that a query that returns less than 1,000 columns may not signal the end. From my experience, I had times when I would get less columns returned. You should read until it returns zero. Probably a simpler algorithm anyway. Also I'm glad you specified that the number of rows to read should depend on the size of your columns. I often use just 100 because some of my columns have loads of data. – Alexis Wilke Mar 01 '16 at 02:17
If there will be many columns or the data should be indexed, then it is better to create a normal column family because: 1) super CF sub-columns are not indexed, and 2) any request for a sub-column deserializes all the sub-columns in the super column. But, that just might be a limitation in the current code base, see http://wiki.apache.org/cassandra/CassandraLimitations

- 6,403
- 2
- 28
- 36
-
What's the difference between the CQL `CREATE TABLE` with multiple columns and the super columns? Because it feels equivalent to me... – Alexis Wilke Mar 01 '16 at 02:30