You are right, documentation for Lucene can be a bit challenging because the last revision of the book Lucene In Action was for version 3.0 and there were very major changes made in Lucene 4.0. I have found one book that covers Lucene 4 called Lucene 4 Cookbook but it's not too think and only it's coverage of sort is limited to a single page, but it does provide an example.
One great source for learning about Lucene is the unit tests stored with the project. This is where I found the example below. This example shows how store your number as a NumericDocValue
and then sort by it. Unit tests are not typicaly suitable for cut and paste app use but they do a good job of showing how the feature us used. So for example this unit test uses a RandomIndexWriter
whereas you'd use a IndexWriter
.
This sorting approach leverages DocValues. One thing to remember about DocValues is that they are not stored with the document but rather are stored together by DocValue field. This is what makes them especially suitable for sorting. But when you read back the document it won't be be one of the fields unless you also stored the value as a field in the document. This is why the example stores the value twice, once as a NumericDocValuesField
and once as a StringField
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/** Tests sorting on type int */
public void testInt() throws IOException {
Directory dir = newDirectory();
RandomIndexWriter writer = new RandomIndexWriter(random(), dir);
Document doc = new Document();
doc.add(new NumericDocValuesField("value", 300000));
doc.add(newStringField("value", "300000", Field.Store.YES));
writer.addDocument(doc);
doc = new Document();
doc.add(new NumericDocValuesField("value", -1));
doc.add(newStringField("value", "-1", Field.Store.YES));
writer.addDocument(doc);
doc = new Document();
doc.add(new NumericDocValuesField("value", 4));
doc.add(newStringField("value", "4", Field.Store.YES));
writer.addDocument(doc);
IndexReader ir = writer.getReader();
writer.close();
IndexSearcher searcher = newSearcher(ir);
Sort sort = new Sort(new SortField("value", SortField.Type.INT));
TopDocs td = searcher.search(new MatchAllDocsQuery(), 10, sort);
assertEquals(3, td.totalHits.value);
// numeric order
assertEquals("-1", searcher.doc(td.scoreDocs[0].doc).get("value"));
assertEquals("4", searcher.doc(td.scoreDocs[1].doc).get("value"));
assertEquals("300000", searcher.doc(td.scoreDocs[2].doc).get("value"));
ir.close();
dir.close();
}
source: Lucene Unit Test on GitHub
Unfortunately I'm a c# developer rather then a Java developer so it's a little hard for me to write for you a closer example to what you are asking for using java since I don't yet have an easy way to test Java Lucene code. But I have provided a C# example below that used LuceneNet which I think you will find very easy to translate to Java.
public void NumericDocValueSort() {
Analyzer standardAnalyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
Directory indexDir = new RAMDirectory();
IndexWriterConfig iwc = new IndexWriterConfig(LuceneVersion.LUCENE_48, standardAnalyzer);
IndexWriter indexWriter = new IndexWriter(indexDir, iwc);
Document doc = new Document();
doc.Add(new TextField("name", "A1", Field.Store.YES));
//doc.Add(new StoredField("number", 1000L)); //uncomment this line to optionally be able to retrieve it from the doc later, can be done for every doc
doc.Add(new NumericDocValuesField("number", 1000L));
indexWriter.AddDocument(doc);
doc.Fields.Clear();
doc.Add(new TextField("name", "A2", Field.Store.YES));
doc.Add(new NumericDocValuesField("number", 1001L));
indexWriter.AddDocument(doc);
doc.Fields.Clear();
doc.Add(new TextField("name", "A3", Field.Store.YES));
doc.Add(new NumericDocValuesField("number", 990L));
indexWriter.AddDocument(doc);
doc.Fields.Clear();
doc.Add(new TextField("name", "A4", Field.Store.YES));
doc.Add(new NumericDocValuesField("number", 300L));
indexWriter.AddDocument(doc);
indexWriter.Commit();
IndexReader reader = indexWriter.GetReader(applyAllDeletes: true);
IndexSearcher searcher = new IndexSearcher(reader);
Sort sort;
TopDocs docs;
SortField sortField = new SortField("number", SortFieldType.INT64);
sort = new Sort(sortField);
docs = searcher.Search(new MatchAllDocsQuery(), 1000, sort);
foreach (ScoreDoc scoreDoc in docs.ScoreDocs) {
Document curDoc = searcher.Doc(scoreDoc.Doc);
string name = curDoc.Get("name");
}
reader.Dispose(); //reader.close() in java
}
I ran this code on my machine and it returns the docs in the for loop in the proper number order. Note that the reason I use NumericDocValuesField
rather then SortedNumericSortField
is because the later is only needed if a single document contains multiple values for the field. Your example did not, so NumericDocValuesField
is the one you want in that case.
People are often confused by the word Sorted in the name SortedNumericSortField
.
In this context it means that if the field contains multiple values for that field in the document those values will be listed in the document's field in sorted order. It has nothing to do with the idea of needing the documents in sorted order. Yah, I know, not the best naming approach, kinda confusing. Anyway, hopefully that solves it for you.