Show contents of Lucene index

Question

I am trying to debug indexing documents in Lucene, and I need to see the contents of the index so I can see exactly how the documents got indexed. Allegedly Luke does this, but there is no documentation for it whatsoever, and when I point it at the index directory (at any of them, although I don't know why it can't figure out which one is right on its own), I get nothing. Surely there is some simple way to do this?

Ok, after a few days of chewing at this, as far as I can tell the fact that this is actually Elasticsearch wrapping Lucene is why Luke can't read the index, and apparently there is just flat no way to show the contents of the index. Bummer. — cbmanica, Jan 08 '13 at 18:45
No, ES uses normal Lucene indices ... you must have the wrong version. Download these for a current version: https://github.com/DmitryKey/luke/releases — , Dec 03 '16 at 21:34

score 15 · Answer 1 · edited Aug 16 '23 at 14:36

15

Luke IS the simple way to do it. You run it, browse to the index, and are off to the races. Couldn't be easier.

There are other tools out there, like LIMO is also a nice tool for this, but it is harder to get started than Luke.

Perhaps if you give some details on the problem you are running into with Luke, you will be able to get some help with that.

edited Aug 16 '23 at 14:36

Henry Ecker

34,399
18
41
57

answered Jan 05 '13 at 00:29

femtoRgon

32,893
7
60
87

Really nothing more to say other than 1) I know there's data indexed, because search results come back; 2) there are four different index directories (for what reason I don't know), and I've tried pointing at all of them; 3) Luke shows no records in any of those directories. Now, this is the Elasticsearch wrapper for Lucene, so I suppose it could be stuffing data in some insane place I"m not looking, but I'm assuming that this wasn't written by evil gnomes... – cbmanica Jan 05 '13 at 00:36
I believe the index directory in ElasticSearch is configured, in an entry like `data: /var/data/elasticsearch`, per the [configuration docs](http://www.elasticsearch.org/guide/reference/setup/configuration.html). It that where you have looked? – femtoRgon Jan 05 '13 at 00:42
This is installed on OSX on a user account, so the data is in ~/elasticsearch/data, unless I'm grievously deceived. – cbmanica Jan 05 '13 at 00:50
1

@pierocy - Well, I haven't tried to be sure, but I believe [luke-5.2.0](https://github.com/DmitryKey/luke/releases) should do just fine with Lucene 5.2.0. As far as 5.3.0, it only released a week ago. If there are changes that need to be made to support it, there is probably going be a bit of lag time. – femtoRgon Aug 31 '15 at 09:54
Luke become an Apache `Lucene` module since Lucene `8.1`. We can download Lucene binary release package to get the latest Luke: https://lucene.apache.org/core/downloads.html – Happy Feb 27 '23 at 08:12

score 3 · Answer 2 · answered Jan 05 '13 at 00:33

I don't know much about Luke, but I have worked with Lucene a lot. To see what is indexed may be tricky, even with Luke, because you can only see the data for stored fields.

For the last Lucene project I did (Solr actually), I had virtually every field marked as indexed but not stored. For those cases, to test if a document had the right indexed term, I would query the index for documents with the given primary key and the expected term. If it matches, then I know it indexed it with that term.

For example, to see if product 5 is in English, I would say productId:5 and lang:en

I know this doesn't directly answer your question about how to use Luke, but this may be an alternative if Luke can't help you.

I'll keep that in mind in case it's helpful later, although in this case unfortunately it wasn't very useful. Thanks though. — cbmanica, Jan 09 '13 at 01:20

score 1 · Answer 3 · answered Jan 07 '13 at 12:10

Luke tries to show the values in fields that are indexed but not stored when you use the "Reconstruct & Edit" button from the "Documents" tab. If I recall right, stop words do not show up in the "Reconstruct & Edit" display -- you see things like "null_1", "null_2", etc.

Kewl_guy89 · Answer 4 · 2016-07-06T14:58:04.350

It is possible to compile luke from source while adding Elastic search format into Luke MetaINF/services.

Just follow this approach

Using Luke with ElasticSearch

This is also can be followed to test custom posting formats/ Codecs with LUcene

ElasticSearch uses a custom postings format (the postings format defines how the inverted index is represented in memory / on disk), and Luke doesn’t know about it. To tell Luke about the ES postings format, add the SPI class by following the steps below.

Clone Luke source repositry:

2.Add a dependency on your required version of ElasticSearch to the Luke project’s pom file:

<!-- ElasticSearch -->
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>1.1.1</version>
</dependency>

Compile the Luke jar file (creates target/luke-with-deps.jar):

$ mvn package

4.Unpack Luke’s list of known postings formats to a temporary file:

$ unzip target/luke-with-deps.jar META-INF/services/org.apache.lucene.codecs.PostingsFormat -d ./tmp/
Archive:  target/luke-with-deps.jar
  inflating: ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat

Add the ElasticSearch postings formats to the temp file:

$ echo "org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat"

./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat $ echo "org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat" ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat $ echo "org.elasticsearch.search.suggest.completion.Completion090PostingsFormat" ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
Repack the modified file back into the jar:

$ jar -uf target/luke-with-deps.jar -C tmp/ META-INF/services/org.apache.lucene.codecs.PostingsFormat
Run Luke

$./luke.sh

Show contents of Lucene index

4 Answers4