Lucene provides a bitset of all non-deleted documents, called liveDocs
. You can get it by iterating over all LeafReader
s (or using the SlowCompositeReaderWrapper
) and calling the liveDocs
method or by using the MultiFields
class.
Once you have this bitset, you can iterator from 0
to IndexReader#maxDoc
and consult the bitset to know whether a docid is representing a deleted document or a live one. You can access all stored fields of a deleted document just as you would from a live one.
However, once a segment gets merged, its deleted documents are permanently deleted and thus removed from the index.