0

I am using Lucene 3.6.2 on Android. The code used and the observations made are as below.

Indexing Code:

public void indexBookContent(Book book, File externalFilesDir) throws Exception {
    IndexWriter indexWriter = null;
    NIOFSDirectory directory = null;

    directory = new NIOFSDirectory(new File(externalFilesDir.getPath() + "/IndexFile", book.getBookId()));
    IndexWriterConfig indexWriterConfig = new IndexWriterConfig(LUCENE_36, new StandardAnalyzer(LUCENE_36));
    indexWriter = new IndexWriter(directory, indexWriterConfig);

    Document document = createFieldsForContent();

    String pageContent = Html.fromHtml(decryptedPage).toString();
    ((Field) document.getFieldable("content")).setValue(pageContent);
    ((Field) document.getFieldable("content")).setValue(pageContent);
    ((Field) document.getFieldable("content")).setValue(pageContent.toLowerCase());
}

private Document createFieldsForContent() {
    Document document = new Document();

    Field contentFieldLower = new Field("content", "", YES, NOT_ANALYZED);
    document.add(contentFieldLower);
    Field contentField = new Field("content", "", YES, ANALYZED);
    document.add(contentField);
    Field contentFieldNotAnalysed = new Field("content", "", YES, NOT_ANALYZED);
    document.add(contentFieldNotAnalysed);
    Field recordIdField = new Field("recordId", "", YES, ANALYZED);
    document.add(recordIdField);
    return document;
}

public JSONArray searchBook(String bookId, String searchText, File externalFieldsDir, String filter) throws Exception {
    List<SearchResultData> searchResults = null;
    NIOFSDirectory directory = null;
    IndexReader indexReader = null;
    IndexSearcher indexSearcher = null;

    directory = new NIOFSDirectory(new File(externalFieldsDir.getPath() + "/IndexFile", bookId));
    indexReader = IndexReader.open(directory);
    indexSearcher = new IndexSearcher(indexReader);

    Query finalQuery = constructSearchQuery(searchText, filter);

    TopScoreDocCollector collector = TopScoreDocCollector.create(100, false);
    indexSearcher.search(finalQuery, collector);
    ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
}

private Query constructSearchQuery(String searchText, String filter) throws ParseException {
    QueryParser contentQueryParser = new QueryParser(LUCENE_36, "content", new StandardAnalyzer(LUCENE_36));
    contentQueryParser.setAllowLeadingWildcard(true);
    contentQueryParser.setLowercaseExpandedTerms(false);

    String wildCardSearchText = "*" + QueryParser.escape(searchText) + "*";

    // Query Parser used.
    Query contentQuery = contentQueryParser.parse(wildCardSearchText);
    return contentQueryParser.parse(wildCardSearchText);
}

I have gone through this: "Lucene: Multi-word phrases as search terms", and my logic didn't seem to different.

My doubt is that the fields are getting overwritten. Also, I need Chinese language support which works with this code except the problem of two or more word support.

Community
  • 1
  • 1
Zooter
  • 79
  • 8
  • I don't seem to understand what is your exact problem. Is like in the link you mention that when you enter multiple word does not return correct results. In which field do you search and by which query, give some example – Eypros Jun 20 '14 at 07:19
  • Let me state my observations here. The search for single word works fine, so does single chinese words and special characters. But if I search for two words, i do not get any results. I'll update the code above to specify the query details – Zooter Jun 20 '14 at 07:58

1 Answers1

1

One note, up front:

Seeing a search implementation like this seems immediately a bit strange. It looks like an overly complicated way to do a linear search through all the available strings. I don't know what exactly you need to accomplish, but I suspect you would be better served working on appropriate analysis of your text, rather than doing a double wildcard on keyword analyzed text, which will perform poorly, and not provide much flexibility in the search.


Moving on to more specific issues:

You are analyzing the same content in the same field multiple times with different analysis methods.

Field contentFieldLower = new Field("content", "", YES, NOT_ANALYZED);
document.add(contentFieldLower);
Field contentField = new Field("content", "", YES, ANALYZED);
document.add(contentField);
Field contentFieldNotAnalysed = new Field("content", "", YES, NOT_ANALYZED);
document.add(contentFieldNotAnalysed);

Instead, if you really need all these analysis methods to be available for searching, you should probably be indexing them in distinct fields. Searching these together doesn't make sense, so they shouldn't be in the same field.

Then you have this sort of pattern:

Field contentField = new Field("content", "", YES, ANALYZED);
document.add(contentField);
//Somewhat later
((Field) document.getFieldable("content")).setValue(pageContent);

Don't do this, this doesn't make sense. Just pass your content into the constructor, and add it to your document:

Field contentField = new Field("content", pageContent, YES, ANALYZED);
document.add(contentField);

Especially if you do opt to continue to analyzing in multiple ways in the same field, there is no way to get one among the different Field implementations (getFieldable will always return the first one added)

And this query:

String wildCardSearchText = "*" + QueryParser.escape(searchText) + "*";

As you mentioned, won't work well with multiple terms. It runs afoul of QueryParser syntax. What you end up with is something like: *two terms*, which will be searched as:

field:*two field:terms*

Which won't generate any matches against your keyword field (presumably). The QueryParser won't do well with this sort of query at all. You'll need to construct a wildcard query yourself here:

WildcardQuery query  = new WildcardQuery(new Term("field", "*two terms*"));
femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • Thanks for the note. The reason I used document.getFieldable was that I was creating various documents for items other than the "content" using the same method. I have corrected that now. Works well. Thanks. – Zooter Jun 25 '14 at 06:50