2

I am following the hadoop_cql3_word_count example in Cassandra and have questions with the following code segment:

    String query =
        "UPDATE " + KEYSPACE + "." + OUTPUT_COLUMN_FAMILY +
        " SET count_num = ? ";
    CqlConfigHelper.setOutputCql(job.getConfiguration(), query);

My questions are:

  1. What is the definition of the question mark (i.e., ?) in the above query? Does Cassandra process it in a way such that the question mark is replaced by some value?
  2. If I would like to update multiple columns of a row given its key, how should I modify the above update statement?

Thank you,

keelar
  • 5,814
  • 7
  • 40
  • 79

1 Answers1

1

The ? represents a slot for a variable in a prepared statement. When your MR job completes the values will be placed into the ?s in order.

If your MR results looked like (key=key1, 1) (key=key2, 2) (key=key3, 3)

Then the statements executed would be

Update Keyspace.columnfamily SET count_num = 1 where key=key1
Update Keyspace.columnfamily SET count_num = 2 where key=key2
Update Keyspace.columnfamily SET count_num = 3 where key=key3

To update multiple columns you just need to write a larger prepared statement and make sure your map reduce job is providing all of the appropriate values.

In the WC example

    keys.put("row_id1", ByteBufferUtil.bytes(partitionKeys[0]));
    keys.put("row_id2", ByteBufferUtil.bytes(partitionKeys[1]));
    ...
    keys.put("word", ByteBufferUtil.bytes(word.toString()));
    variables.add(ByteBufferUtil.bytes(String.valueOf(sum)));         

    ...
    context.write(keys, getBindVariables(word, sum));

This makes the reducer output look like ({row_id1=1,row_id2=3,word=pizza},4)

And the prepared statement will be executed like

UPDATE cql3_worldcount.output_words SET count_num = 4 where row_id1=1 AND row_id2=3 AND word=pizza ;

If I wanted a prepared statement with multiple columns it would look like

UPDATE test SET a =?,b=?,c=?,d=? (This gets filled in by the connector: where key=...)

With a real prepared statement we would also fill in the key as well, but here the connector to Cassandra will just use whatever mappings you have in your reducer output.

({key='mykey'},(1,2,3,4))
becomes
UPDATE test SET a =1,b=2,c=3,d=4 where key=mykey

For more information on prepared statements in general check SO Question about Prepared Statements in CQL

Community
  • 1
  • 1
RussS
  • 16,476
  • 1
  • 34
  • 62
  • So you're saying if I would like to output multiple columns, just place multiple `?` in order in my prepared statement. Am I right on this part? – keelar Sep 26 '13 at 20:37
  • And, can I know how do you know such settings btw, I can't find any resource describing the prepared statement before :'( – keelar Sep 26 '13 at 20:38
  • I added some more information, You can also look at basic descriptions of SQL prepared statements which operate in basically the same manner. – RussS Sep 26 '13 at 21:34