16

I know that there are TTLs on columns in Cassandra. But is it also possible to set a TTL on a row? Setting a TTL on each column doesn't solve my problem as can be seen in the following usecase:

At some point a process wants to delete a complete row with a TTL (let's say row "A" with TTL 1 week). It could do this by replacing all existing columns with the same content but with a TTL of 1 week.

But there may be another process running concurrently on that row "A" which inserts new columns or replaces existing ones without a TTL because that process can't know that the row is to be deleted (it runs concurrently!). So after 1 week all columns of row "A" will be deleted because of the TTL except for these newly inserted ones. And I also want them to be deleted.

So is there or will there be Cassandra support for this use case or do I have to implement something on my own?

Kind Regards
Stefan

snd
  • 195
  • 1
  • 1
  • 6
  • As mentioned in an answer to Richard, we don't really need a TTL on all data of a row. As can be seen in the discussion for us it is sufficient to delete all data up to a given timestamp in the future (i. e. 1 week) and have a ttl on one column. – snd May 16 '13 at 14:28

3 Answers3

13

There is no way of setting a TTL on a row in Cassandra currently. TTLs are designed for deleting individual columns when their lifetime is known when they are written.

You could achieve what you want by delaying your process - instead of wanting to insert a TTL of 1 week, run it a week later and delete the row. Row deletes have the following semantics: any column inserted just before will get deleted but columns inserted just after won't be.

If columns that are inserted in the future still need to be deleted you could insert a row delete with a timestamp in the future to ensure this but be very careful: if you later wanted to insert into that row you couldn't, columns would just disappear when written to that row (until the tombstone is garbage collected).

Richard
  • 11,050
  • 2
  • 46
  • 33
  • The idea of deleting with a timestamp in the future is interesting. But sadly I don't know the name of all columns which might be inserted. – snd May 16 '13 at 09:54
  • You don't need to know the names of the columns when using row deletes. – Richard May 16 '13 at 10:24
  • Aaah, ok :) I just checked it. I didn't know this would work. I think we will use it that way: We will delete the row with a timestamp in the future (1 week) and insert a DELETED-Marker with the same timestamp and a TTL which expires soon after that. So the delete in the future also deletes the updates from concurrent processes and the DELETED-Marker prevents others from inserting into a deleted row. And after the DELETED-Marker is expired the row can be used again. Nice. Thanks for that hint. – snd May 16 '13 at 13:45
  • Deleting a row with timestamp t means delete all columns with timestamp <= t. So if you delete a row with timestamp now+1 week it will delete everything now and future inserts for 1 week. It won't be like a TTL and keep the columns for one week. You need to do the delete 1 week later. – Richard May 16 '13 at 13:53
  • Yeah, but it also satisfies our usecase. We only need some data with a TTL (i.e. the DELETED-Marker). The other data can stay there with a TTL or can be deleted instantly. The goal is that after a week there is no DATA left from concurrent and non-concurrent writers. – snd May 16 '13 at 14:22
  • are you talking about deleting expired row manually or cassandra will handle this – Manish Kumar Apr 21 '14 at 08:35
  • I'm confused. Here is an example of setting a Row TTL https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_ttl_t.html – Ivan Balashov May 20 '16 at 09:28
  • It's still per cell. But of course you can set the TTL of everything in the row to the same, but an overwrite of any would reset theirs. – Richard May 22 '16 at 18:19
8

You can set ttl for a row in Cassandra 3 using

INSERT INTO Counter(key,eventTime,value) VALUES ('1001',dateof(now()),100) USING ttl 10;
DanielBarbarian
  • 5,093
  • 12
  • 35
  • 44
Mahesh Reddy
  • 81
  • 1
  • 1
  • 2
    it doesn't serve questioner's use-case. If you update a column, it's ttl will be changed(will be null if you do not specify any ttl in update query). As a result, the row will exists with those updated columns after ttl expiration. – mhc Nov 14 '16 at 09:04
1

Although I do not recommend such, there is a Cassandra way to fix the problem:

SELECT TTL(value) FROM table WHERE ...;

Get the current TTL of a value first, then use the result to set the TTL in an INSERT or UPDATE:

INSERT ... USING TTL ttl-of-value;

So... I think that the SELECT TTL() is slow (from experience with TTL() and WRITETIME() in some of my CQL commands). Not only that, the TTL is correct at the time the select results are generated on the Cassandra node, but by the time the insert happens, it will be off. Cassandra should have offered a time to delete rather than a time to live...

So as mentioned by Richard, having your own process to delete data after 1 week is probably safer. You should have one column to save the date of creation or the date when the data becomes obsolete. Then a background process can read that date and if the data is viewed as obsolete, drop the entire row.

Other processes can also use that date to know whether that row is considered valid or not! (so even if it was not yet deleted, you can still view the row as invalid if the date is passed.)

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156