Cassandra CQL retrieve various rows from list of primary keys

Question

I have at a certain point in my software a list of primary keys of which I want to retrieve information from a massively huge table, and I'm wondering what's the most practical way of doing this. Let me illustrate:

Let this be my table structure:

CREATE TABLE table_a(
    name text,
    date datetime,
    key int,
    information1 text,
    information2 text,
    PRIMARY KEY ((name, date), key)
)

say I have a list of primary keys:

list = [['Jack', '2015-01-01 00:00:00', 1],
       ['Jack', '2015-01-01 00:00:00', 2],
       ['Richard', '2015-02-14 00:00:00', 5],
       ['David', '2015-01-01 00:00:00', 9],
        ...
       ['Last', '2014-08-13 00:00:00', 12]]

Say this list is huge (hundreds of thousands) and not ordered in any way. I want to retrieve, for every key on the list, the value of the information columns.

As of now, the way I'm solving this issue is executing a select query for each key, and that has been sufficient hitherto. However I'm worried about execution times when the list of keys get too huge. Is there a more practical way of querying cassandra for a list of rows of which I know the primary keys without executing one query per key?

If the key was a single field, I could use the select * from table where key in (1,2,6,3,2,4,8) syntax to obtain all the keys I want in one query, however I don't see how to do this with composite primary keys.

Any light on the case is appreciated.

Possible duplicate of [What is the best way to read data from Cassandra in parallel?](https://stackoverflow.com/questions/36690811/what-is-the-best-way-to-read-data-from-cassandra-in-parallel) — rogerdpack, Apr 13 '18 at 21:30

score 1 · Accepted Answer · answered May 20 '16 at 18:41

The best way to go about something like this, is to run these queries in parallel. You can do that on the (Java) application side by using async futures, like this:

Future<List<ResultSet>> future = ResultSets.queryAllAsList(session,
    "SELECT * FROM users WHERE id=?",
      UUID.fromString("0a63bce5-1ee3-4bbf-9bad-d4e136e0d7d1"),
      UUID.fromString("7a69657f-39b3-495f-b760-9e044b3c91a9")
);

for (ResultSet rs : future.get()) {
    ... // process the results here
}

score 0 · Answer 2 · answered May 19 '16 at 21:38

0

Create a table that has the 3 columns worth of data piped together into a single value and store that single string value in a single column. Make that column the PK. Then you can use the IN clause to filter. For example, select * from table where key IN ('Jack|2015-01-01 00:00:00|1', 'Jack|2015-01-01 00:00:00|2').

Hope that helps! Adam

answered May 19 '16 at 21:38

Adam Hutson

66
2

Hi Adam, thanks for the suggestion! I had actually thought of this, and it does solve the problem, however changing the primary key columns of the table is not possible without dumping all the data out, creating a new table then dumping in again on the new structure. Also it requires changing all the code that interacts with this table to comply with the new structure. I appreciate the help though, really, thanks. – Lucas Franceschi May 20 '16 at 13:44

Cassandra CQL retrieve various rows from list of primary keys

2 Answers2