Advantages of using cql over thrift

Question

Are there any distinct advantages for using cql over thrift or is it simply a case of developers being too used to SQL? I'm wanting to switch from thrift querying to cql, the only problem is I'm not sure about the downsides of doing so. What are they?

score 22 · Answer 1 · answered Mar 29 '13 at 17:19

Lyuben's answer is a good one, but I believe he may be misinformed on a few points. First, you should be aware that the Thrift API is not going to be getting new features; it's there for backwards compatibility, and not recommended for new projects. There are already some features that can not be used through the Thrift interface.

Another factor is that the quoted benchmarks from Acunu are misleading; they don't measure the performance of CQL with prepared statements. See, for example, the graphs at https://issues.apache.org/jira/browse/CASSANDRA-3634 (probably the same data set on which the Acunu post is based, since Eric Evans wrote both). There have also been some improvements to CQL parsing and execution speed in the last year. It is not likely that you will observe any real speed difference between CQL 3 and Thrift.

Finally, I don't think I even agree that Thrift is more flexible. The CQL 3 datamodel allows using the same data structures that Thrift does for nearly all usages that are not antipatterns; it just allows you to think about the model in a more organized way. For example, Lyuben mentioned rows with differing numbers of columns. A CQL 3 table may still utilize that capability: there is a difference between "storage engine rows" (which is Cassandra's low level storage, and what Thrift uses directly) and "CQL rows" (what you see through the Thrift interface). CQL just does the extra work necessary to visualize wide storage engine rows as structured tables.

It's a little difficult to explain in a quick SO answer, but see this post for a somewhat gentle explanation.

Sorry but not agree with the point that CQL as flexible as Thrift API. It's really odd and nonsense to make Cassandra away from a Schema-free NoSql. — arganzheng, Jun 14 '19 at 12:22

score 16 · Accepted Answer · edited Apr 04 '15 at 21:18

16

Querying
In CQL you can query cassandra and get data in a couple of lines (using JDBC driver):

String query = "SELECT * FROM message;";
PreparedStatement statement = con.prepareStatement(query);

While in thrift based API's it's a bit more complicated (example with Astyanax):

OperationResult<ColumnList<String>> result = 
     keyspace.prepareQuery(mail/*specify columnfamily structure*/)
             .getKey("lyuben@1363115059").execute();
ColumnList<String> columns = result.getResult();

Performance
Based on the benchmarks carried out by Acunu, Thrift (RPC) is slightly ahead of CQL when it comes to query performance, but you need to be in a situation where high throughput is key for this performance advantage to have a significant benefit.

Some excellent articles to lookup are:

EDIT

The above benchmarks are outdated, the paul provided newer benchmarks on prepared statements.

edited Apr 04 '15 at 21:18

Natan Streppel

5,759
6
35
43

answered Mar 29 '13 at 10:30

Lyuben Todorov

13,987
5
50
69

4

CQL still allows you to have variable columns in the underlying data structure, it just exposes them through patterns. Like giving you the ability to have maps, sets and lists. Which under the hood are all implemented as variable column names. CQL3 tries to take the things people used variable columns for and export those higher level concepts. The following article goes over some of that: http://www.datastax.com/dev/blog/thrift-to-cql3 – Zanson Mar 30 '13 at 16:21
@Zanson Did you read my answer? Notice how that article is a link in it. – Lyuben Todorov Mar 30 '13 at 16:38
1

yes I read it, you linked the article, but didn't mention any of the points it makes. And you called out "no variable column names" as a downside of CQL, when CQL provides functionality to do most of the things variable column names are useful for, without needing them. – Zanson Apr 01 '13 at 14:16
5

I'm with Zanson -- "CQL does not [support the schemaless model]" -- is, to be blunt, incorrect. The confusion arises because Thrift and CQL use the same term, "column," to mean two different things. By this, Thrift means a raw storage engine cell, which can be one or more CQL columns. CQL gives you the same power of Cassandra's sparse storage engine, but exposes it in a way that eliminates a lot of the boilerplate involved (e.g.: Collections) and gives us a much cleaner base for further improvements down the road. Source: I am the author of two of the linked blog posts. – jbellis Apr 02 '13 at 13:08
2

@jbellis So am I correct in thinking that if we have a cql3 table and we wish to add new columns that don't exist, we need to first modify the table's schema to allow this? (That is what I meant by less flexible, that you need to go back and modify the table's schema before adding new columns and I understand this improves data consistency) – Lyuben Todorov Apr 03 '13 at 12:31

Advantages of using cql over thrift

2 Answers2

Linked