3

I just wanted to know how it is possible to to update (delete/insert) a document based on a numeric field. So far I did this:

LuceneManager.updateDocument(writer, new Term("id",  NumericUtils.intToPrefixCoded(sentenceId)), newDoc);

But now with Lucene 4.0 the NumericUtils class has changed to this which I don't really understand. Any help?

Daniel Gerber
  • 3,226
  • 3
  • 25
  • 32
  • Is there a particular reason you were transforming numbers with NumericUtils manually, rather than using a [NumericField](http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/document/NumericField.html)? – femtoRgon Dec 19 '12 at 18:33
  • Well because the IndexWriter wants a Term, and I didn't know any other way to create a Term for a Numeric Field – Daniel Gerber Dec 20 '12 at 10:50

5 Answers5

2

With Lucene 5.x, this could be solved by code below:

    int id = 1;
    BytesRefBuilder brb = new BytesRefBuilder();
    NumericUtils.intToPrefixCodedBytes(id, 0, brb);
    Term term = new Term("id", brb.get());
    indexWriter.updateDocument(term, doc); // or indexWriter.deleteDocument(term);
Jet Yang
  • 912
  • 9
  • 12
1

You can use it this way:

First you must set the FieldType's numeric type:

FieldType TYPE_ID = new FieldType();
...
TYPE_ID.setNumericType(NumericType.INT);
TYPE_ID.freeze();

and then:

int idTerm = 10;
BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT);
NumericUtils.intToPrefixCoded(id, 0, bytes);
Term idTerm = new Term("id", bytes);

and now you'll be able to use idTerm to update the doc.

Sayyid
  • 19
  • 1
0

With Lucene 4, you can now create IntField, LongField, FloatField or DoubleField like this:

document.add(new IntField("id", 6, Field.Store.NO));

To write the document once you modified it, it's still:

indexWriter.updateDocument(new Term("pk", "<pk value>"), document);

EDIT: And here is a way to make a query including this numeric field:

// Query <=> id <= 7
Query query = NumericRangeQuery.newIntRange("id", Integer.MIN_VALUE, 7, true, true);
TopDocs topDocs = indexSearcher.search(query, 10);
aymeric
  • 3,877
  • 2
  • 28
  • 42
  • So I just use the toString() value of the Integer/Float/Long/Double object? – Daniel Gerber Dec 19 '12 at 18:25
  • No, all these fields have a method [numericValue()](http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/document/Field.html#numericValue()) that returns the number of the underlying field. To know which type the number is (int, long...), you either use `instanceof` or use [FieldType.NumericType](http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/document/FieldType.NumericType.html) – aymeric Dec 19 '12 at 18:31
  • 1
    I don't think this can work this way. If I have the query = new TermQuery(new Term(LUCENE_FIELD_ID, new IntField(LUCENE_FIELD_ID, 1, Store.YES).stringValue())); then the query looks like "id:1" which does not return any results. Where as NumericUtils.intToPrefixCoded(1) returns results. What am I doing wrong? – Daniel Gerber Dec 20 '12 at 10:44
  • The way to query has changed as well. Now, instead of a TermQuery, you should use a [NumericRangeQuery](http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/NumericRangeQuery.html). – aymeric Dec 20 '12 at 16:29
  • Thx. I figured this also out today. Makes life much easier. :) I still don't really get how to update my document based on a "integer-term" – Daniel Gerber Dec 20 '12 at 20:27
  • TermQuery can't be used to `updateDocuments` 10 years later I'm afraid – borowis Aug 08 '22 at 16:58
  • I guess Lucene 4 hasn't changed much in the last 10 years. But I also hope people are now using newer versions ;) – aymeric Aug 15 '22 at 08:21
0

I would recommend, if possible, it would be better to store an ID as a keyword string, rather than a number. If it is simply a unique identifier, indexing as a keyword makes much more sense. This removes any need to mess with numeric formatting.

If it is actually being used as a number, then you might need to perform the update manually. That is, search for and fetch the document you wish to update, delete the old document with tryDeleteDocument, and then add the updated version with addDocument. This is basically what updateDocument does anyway, to my knowledge.

The first option would certainly be the better way, though. A non-numeric field to use as an update ID would make life easier.

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • You are right having the key as a string value would be cool, but this would mean a major refactoring throughout my code and reindexing the corpus, for which I sadly don't have time. I will give it a try! – Daniel Gerber Dec 20 '12 at 20:30
0

According to the documentation of Lucene 4.0.0, the ID field must to be used with StringField class:

"A field that is indexed but not tokenized: the entire String value is indexed as a single token. For example this might be used for a 'country' field or an 'id' field, or any field that you intend to use for sorting or access through the field cache."

I had the same problem as you and I solved it by making this change. After that, my update and delete worked perfectly.

Deise
  • 1