Tombstone vs nodetool and repair

Question

I inserted 10K entries in a table in Cassandra which has the TTL of 1 minute under the single partition.

After the successful insert, I tried to read all the data from a single partition but it throws an error like below,

WARN  [ReadStage-2] 2018-04-04 11:39:44,833 ReadCommand.java:533 - Read 0 live rows and 100001 tombstone cells for query SELECT * FROM qcs.job LIMIT 100 (see tombstone_warn_threshold)
DEBUG [Native-Transport-Requests-1] 2018-04-04 11:39:44,834 ReadCallback.java:132 - Failed; received 0 of 1 responses
ERROR [ReadStage-2] 2018-04-04 11:39:44,836 StorageProxy.java:1906 - Scanned over 100001 tombstones during query 'SELECT * FROM qcs.job LIMIT 100' (last scanned row partion key was ((job), 2018-04-04 11:19+0530, 1, jobType1522820944168, jobId1522820944168)); query aborted

I understand tombstone is an marking in the sstable not the actual delete.

So I performed the compaction and repair using nodetool

Even after that when I read the data from the table, It throws the same error in log file.

1) How to handle this scenario?

2) Could some explain why this scenario happened and Why not the compaction and repair didn't solve this issue?

score 3 · Accepted Answer · answered Apr 04 '18 at 08:28

3

Tombstones are really deleted after period specified by gc_grace_seconds setting of the table (it's 10 days by default). This is done to make sure that any node that was down at time of deletion will pickup these changes after recover. Here are the blog posts that discuss this in great details: from thelastpickle (recommended), 1, 2, and DSE documentation or Cassandra documentation.

You can set the gc_grace_seconds option on the individual table to lower value to remove deleted data faster, but this should be done only for tables with TTLed data. You may also need to tweak tombstone_threshold & tombstone_compaction_interval table options to perform compactions faster. See this document or this document for description of these options.

answered Apr 04 '18 at 08:28

Alex Ott

80,552
8
87
132

please check this : https://stackoverflow.com/questions/49878072/distributed-logs-in-cassandra – Harry Apr 17 '18 at 19:16
please check this : https://stackoverflow.com/questions/50385262/cassandra-commit-log-size – Harry May 17 '18 at 07:24
I tried to google it, got a very few info on this : https://stackoverflow.com/questions/50385262/cassandra-commit-log-size so could you give some light on this – Harry May 17 '18 at 08:04
please check this : https://stackoverflow.com/questions/50462617/native-transport-request-in-cassandra – Harry May 22 '18 at 08:01

KUTAY ZORLU · Answer 2 · 2020-01-17T17:06:50.140

0

New cassandra support .

$ ./nodetool garbagecollect

After this command "Transfer memory to disk, before restart"

$ ./nodetool drain    # "This closes connection after that, clients can not access. "

Shutdown cassandra and restart again. "You should restart after drain. "

** You do not need to drain, ! but, depends on situation.! These are extra informations.

edited Jan 17 '20 at 17:06

answered Jan 17 '20 at 09:09

KUTAY ZORLU

93
1
9

You do not need to shutdown but best is that you should drain and restart it. – KUTAY ZORLU Jan 17 '20 at 09:11
1

brrrrr - the recommendation to restart node is very bad. You shouldn't need it. – Alex Ott Jan 17 '20 at 10:16
this depends on situation. ! most cases not GOOD. so you do not need to restart, A user should now what is the mean of the DRAIN command. if you do not know the drain command, do not use it. – KUTAY ZORLU Jan 17 '20 at 17:03

Tombstone vs nodetool and repair

2 Answers2

Linked