To the best of my knowledge I should create 4 tables and make Primary keys ..."foreign keys", should not I?
You definitely do not want to do that. First of all, foreign keys do not exist in Cassandra. Secondly, what you're talking about is modeling from a relational standpoint. With Cassandra, you don't want data for one query spread across multiple tables, because that could be spread across multiple nodes. And querying multiple nodes introduces more network time into the equation, which is slow.
In Cassandra, you want to take a query-based modeling approach. Sometimes that can mean one table for each query. Bearing that in mind, I am hearing that you need to query your properties two different ways:
- By "house" (MLS?)
- By city
Essentially, you should have a table to serve each of those queries:
CREATE TABLE housesbymls (
mls text,
city text,
price bigint,
propertytype text,
state text,
street text,
year bigint,
zip text,
PRIMARY KEY (mls));
CREATE TABLE housesByCity (
mls text,
street text,
city text,
state text,
zip text,
propertyType text,
price bigint,
year bigint,
PRIMARY KEY ((state,city),zip,mls));
After upserting some data, I can query by MLS:
aploetz@cqlsh:stackoverflow> SELECT * FROM housesByMLS WHERE mls='09110857';
mls | city | price | propertytype | state | street | year | zip
----------+----------+--------+------------------+-------+--------------------------------+------+-------
09110857 | Palatine | 104900 | Condominium Unit | IL | 1025 North Serling Avenue, 211 | 1978 | 60067
(1 rows)
And I can query by state/city or state/city/zip:
aploetz@cqlsh:stackoverflow> SELECT * FROM housesByCity
WHERE state='IL' AND city='Palatine';
or:
aploetz@cqlsh:stackoverflow> SELECT * FROM housesByCity
WHERE state='IL' AND city='Palatine' AND zip='60067';
Both of those return:
state | city | zip | mls | price | propertytype | street | year
-------+----------+-------+----------+--------+------------------+--------------------------------+------
IL | Palatine | 60067 | 09110857 | 104900 | Condominium Unit | 1025 North Serling Avenue, 211 | 1978
(1 rows)
The idea behind the PRIMARY KEY structure on this one, is that state
and city
make up the partitioning key (which helps Cassandra figure out where to put that row in the cluster) so they are both required. Then, as cities can have multiple zip
codes, you can also focus your query on that. PRIMARY KEYs in Cassandra are unique, so I put mls
on the end to ensure uniqueness.