1

Possible Duplicate:
How does lucene index documents?

Lucene allows some (or all) of the fields of a document to be indexed only returning an ID that can be then used to query a database for the actual information. However, to be able to search by then Lucene must store these fields somehow.

How exactly is this done? Are the indexed-only fields combined into a hash or a tree-like structure used then to search against? Is there any documentation available regarding on how Lucene searches these indexed-only fields?

My primary concern is understanding how safe/secure the indexed data is stored in a highly sensitive environment. Or in other words, how hard/easy it is to retrieve the indexed fields and associate those with documents, and consequently with other fields from that document.

Community
  • 1
  • 1
rae1
  • 6,066
  • 4
  • 27
  • 48
  • http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html – I4V Jan 23 '13 at 16:59
  • @I4V The link does not address how index-only fields are stored in the index. – rae1 Jan 23 '13 at 18:29
  • @mindas That question referred to how the indexing is performed, rather than how it's stored. My primary concern is understanding how safe/secure the indexed data is stored in a highly sensitive environment. – rae1 Jan 23 '13 at 18:41
  • Can I ask why the `close` votes? – rae1 Jan 23 '13 at 19:57
  • Maybe you want to be a little bit more explicit and ask a separate question just about storing and not indexing. Also worth reading this: http://www.codinghorror.com/blog/2012/03/rubber-duck-problem-solving.html – mindas Jan 24 '13 at 14:27
  • @mindas I'm sorry, but nowhere in the question I ask about indexing. I'm not concern about it. I'm asking specifically about how does Lucene store its index-only values, and in no way the answers to the 'duplicate' question respond that question. – rae1 Jan 24 '13 at 15:31
  • Ok, here you go - "Are the indexed-only fields combined into a hash or a tree-like structure used then to search against?". I suggest you don't take this personally, just tweak the wording, make it more explicit and move on. – mindas Jan 24 '13 at 15:33
  • @mindas Yes, but I'm concerned on how these are stored, in the file system, not how the indexing or the retrieval works. I'm not asking how Lucene breaks the documents into terms, or how it uses these data structures to retrieve the documents from terms. – rae1 Jan 24 '13 at 15:36
  • @mindas Think of this scenario: you open up a `.frq` file: what's in it? in what format? any encoding? any particular data structure? are these encrypted? – rae1 Jan 24 '13 at 15:41
  • @mindas I added the last paragraph (before closing) to exemplify what the context and focus of the question is, and what I expect an answer to be. – rae1 Jan 24 '13 at 15:43
  • Look, if this is your question ("you open up a .frq file: what's in it? in what format? any encoding?"), just phrase it exactly like this and open a new question. I doubt enough people will come back to this question and vote for reopening it. Anyway, I'm off and this is my last post on this topic. I have nothing else to say I'm afraid. – mindas Jan 24 '13 at 15:46

0 Answers0