I'm trying to do a simple request on two tables
What you're trying to do is known as a "distributed join" and Cassandra is specifically designed to prevent you from doing this.
The way to solve these types of problems, is with a process called denormalization. Let's say you have simple two tables carMake
and carModel
:
makeid | make
--------+--------
1 | Chevy
2 | Dodge
3 | Ford
modelid | makeid | model
---------+--------+---------
15 | 3 | Focus
11 | 3 | Mustang
32 | 2 | Charger
82 | 3 | Fusion
Now, in a traditional RDBMS if I wanted to SELECT all car models with a make of "Ford" I would execute a JOIN query. But with Cassandra, the idea is to solve this problem at the modeling stage, by building a table which supports the ability to query make and model of a car at the same time:
CREATE TABLE carMakeModel (
carid int,
make text,
model text,
PRIMARY KEY (make,carid));
aploetz@cqlsh:stackoverflow> SELECT * FROM carMakeModel WHERE make='Ford';
make | carid | model
------+-------+---------
Ford | 1 | Mustang
Ford | 2 | Focus
Ford | 3 | Fusion
(3 rows)
Some key points to note here:
make
is duplicated as much as is necessary. You'll notice that "Ford" is specified 3 times in the result set. If you had data for 13 models of Fords, you would store the value of "Ford" 13 times.
- PRIMARY KEYs in Cassandra are unique. I have
carid
added as a part of the PRIMARY KEY to ensure uniqueness for each model
, otherwise an INSERT for each make
would overwrite itself.