6

Im trying to match a text Config migration from ASA5505 8.2 to ASA5516 in column TITLE.

My program looks like this.

Directory directory = FSDirectory.open(indexDir);

MultiFieldQueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_35,new String[] {"TITLE"}, new StandardAnalyzer(Version.LUCENE_35));        
IndexReader reader = IndexReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);       
queryParser.setPhraseSlop(0);
queryParser.setLowercaseExpandedTerms(true);
Query query = queryParser.parse("TITLE:Config migration from ASA5505 8.2 to ASA5516");
System.out.println(queryStr);
TopDocs topDocs = searcher.search(query,100);
System.out.println(topDocs.totalHits);
ScoreDoc[] hits = topDocs.scoreDocs;
System.out.println(hits.length + " Record(s) Found");
for (int i = 0; i < hits.length; i++) {
    int docId = hits[i].doc;
    Document d = searcher.doc(docId);
    System.out.println("\"Title :\" " +d.get("TITLE") );
}

But its returning

"Title :" Config migration from ASA5505 8.2 to ASA5516
"Title :" Firewall  migration from ASA5585 to  ASA5555
"Title :" Firewall  migration from ASA5585 to  ASA5555

Second 2 results are not expected.So what modification required to match exact text Config migration from ASA5505 8.2 to ASA5516

And my indexing function looks like this

public class Lucene {
public static final String INDEX_DIR = "./Lucene";
private static final String JDBC_DRIVER = "oracle.jdbc.OracleDriver";
private static final String CONNECTION_URL = "jdbc:oracle:thin:xxxxxxx"

private static final String USER_NAME = "localhost";
private static final String PASSWORD = "localhost";
private static final String QUERY = "select * from TITLE_TABLE";

public static void main(String[] args) throws Exception {
    File indexDir = new File(INDEX_DIR);
    Lucene indexer = new Lucene();
    try {
        Date start = new Date();
        Class.forName(JDBC_DRIVER).newInstance();
        Connection conn = DriverManager.getConnection(CONNECTION_URL, USER_NAME, PASSWORD);
        SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_35, analyzer);
        IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir), indexWriterConfig);
        System.out.println("Indexing to directory '" + indexDir + "'...");
        int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
        indexWriter.close();
        System.out.println(indexedDocumentCount + " records have been indexed successfully");
        System.out.println("Total Time:" + (new Date().getTime() - start.getTime()) / (1000));
    } catch (Exception e) {
        e.printStackTrace();
    }
}

int indexDocs(IndexWriter writer, Connection conn) throws Exception {
    String sql = QUERY;
    Statement stmt = conn.createStatement();
    stmt.setFetchSize(100000);
    ResultSet rs = stmt.executeQuery(sql);
    int i = 0;
    while (rs.next()) {
        System.out.println("Addind Doc No:" + i);
        Document d = new Document();
        System.out.println(rs.getString("TITLE"));
        d.add(new Field("TITLE", rs.getString("TITLE"), Field.Store.YES, Field.Index.ANALYZED));
        d.add(new Field("NAME", rs.getString("NAME"), Field.Store.YES, Field.Index.ANALYZED));
        writer.addDocument(d);
        i++;
    }
    return i;
}
}
Santosh Hegde
  • 3,420
  • 10
  • 35
  • 51

3 Answers3

2

PVR is correct, that using a phrase query is probably the right solution here, but they missed on how to use the PhraseQuery class. You are already using QueryParser though, so just use the query parser syntax by enclosing you search text in quotes:

Query query = queryParser.parse("TITLE:\"Config migration from ASA5505 8.2 to ASA5516\"");

Based on your update, you are using a different analyzer at index-time and query-time. SimpleAnalyzer and StandardAnalyzer don't do the same things. Unless you have a very good reason to do otherwise, you should analyze the same way when indexing and querying.

So, change the analyzer in your indexing code to StandardAnalyzer (or vice-versa, use SimpleAnalyzer when querying), and you should see better results.

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • Still no result. Search text converted to lowercase like this **TITLE:"config migration from asa5505 8.2 ? asa5516**. From where **?** is coming ? – Santosh Hegde May 28 '16 at 15:34
  • The ? represents a removed stopword, and is expected behavior for StandardAnalyzer. This works in my tests. I'd look to how the field is being indexed. What analyzer is being used there, etc. – femtoRgon May 28 '16 at 17:49
  • @SantoshHegde - I've edited with the problem I see now with your added indexing code. Hopefully that solves the problem for you. – femtoRgon May 28 '16 at 19:46
0

Try PhraseQuery as follow:

BooleanQuery mainQuery= new BooleanQuery(); 
String searchTerm="config migration from asa5505 8.2 to asa5516";
String strArray[]= searchTerm.split(" ");
for(int index=0;index<strArray.length;index++)
{
    PhraseQuery query1 = new PhraseQuery();
     query1.add(new Term("TITLE",strArray[index]));
     mainQuery.add(query1,BooleanClause.Occur.MUST);
}

And then execute the mainQuery.

Check out this thread of stackoverflow, It may help you to use PhraseQuery for exact search.

Community
  • 1
  • 1
PVR
  • 885
  • 9
  • 18
  • 1
    That is not how building a `PhraseQuery` works. You need to add your terms to the query separately (`query.add(new Term("Title", "config"); query.add(new Term("Title", "migration"); ...`). Since it's being constructed manually, you don't have an Analyzer to rely on. – femtoRgon May 28 '16 at 10:30
  • How do i do a exact match? – Santosh Hegde May 28 '16 at 17:42
  • femtoRgon thanks for you comment, have edited my answer. – PVR May 28 '16 at 18:45
0

Here is what i have written for you which works perfectly:

USE: queryParser.parse("\"Config migration from ASA5505 8.2 to ASA5516\"");

  1. To create indexes

    public static void main(String[] args) 
    {
    
        IndexWriter writer = getIndexWriter();
        Document doc = new Document();
        Document doc1 = new Document();
        Document doc2 = new Document();
        doc.add(new Field("TITLE", "Config migration from ASA5505 8.2 to ASA5516",Field.Store.YES,Field.Index.ANALYZED));
        doc1.add(new Field("TITLE", "Firewall  migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED));
        doc2.add(new Field("TITLE", "Firewall  migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED));
        try 
        {
            writer.addDocument(doc);
            writer.addDocument(doc1);
            writer.addDocument(doc2);
            writer.close();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
    
    public static IndexWriter getIndexWriter()
    {
        IndexWriter indexWriter=null;
    
        try 
        {
        File file=new File("D://index//");
        if(!file.exists())
            file.mkdir();
        IndexWriterConfig conf=new IndexWriterConfig(Version.LUCENE_34, new StandardAnalyzer(Version.LUCENE_34));
        Directory directory=FSDirectory.open(file);
        indexWriter=new IndexWriter(directory, conf);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return indexWriter;
    }
    

    }

2.To search string

    public static void main(String[] args) 
    {

    IndexReader reader=getIndexReader();

    IndexSearcher searcher = new IndexSearcher(reader);

    QueryParser parser = new QueryParser(Version.LUCENE_34, "TITLE" ,new StandardAnalyzer(Version.LUCENE_34));

    Query query;
    try 
    {
    query = parser.parse("\"Config migration from ASA5505 8.2 to ASA5516\"");

    TopDocs hits = searcher.search(query,3);

    ScoreDoc[] document = hits.scoreDocs;
    int i=0;
    for(i=0;i<document.length;i++)
    {
        Document doc = searcher.doc(i);

        System.out.println("TITLE=" + doc.get("TITLE"));
    }
        searcher.close();

    } 
    catch (Exception e) 
    {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } 
            }

public static IndexReader getIndexReader()
{
    IndexReader reader=null;

    Directory dir;
    try 
    {
        dir = FSDirectory.open(new File("D://index//"));
        reader=IndexReader.open(dir);
    } catch (IOException e) 
    {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    return reader;
}   
PVR
  • 885
  • 9
  • 18
  • Are you getting the result? can you add print statement before TopDocs hits = searcher.search(query,3); – Santosh Hegde May 28 '16 at 18:59
  • Yes it gives exact match document. If i print the query it will be as follows: TITLE:"config migration from asa5505 8.2 ? asa5516" – PVR May 28 '16 at 20:06
  • But what about case sensitive part? I wanted to do exact match. – Santosh Hegde May 28 '16 at 20:12
  • Actually parser does that so we don't have to worry about it, we have already passed case sensitive string. Do you have case sensitive exact match multiple TITLE into your index ? – PVR May 29 '16 at 02:24
  • Yes.Indexed data have same TITLE with case level difference.So i have to fetch only TITLE which matches exactly – Santosh Hegde May 30 '16 at 05:43
  • StandardAnalyzer converts every term in lowercase because it applies lowercasefilter, so you can override StandardAnalyzer like shown here : http://www.codewrecks.com/blog/index.php/2012/07/05/case-sensitivity-in-lucene-search/ – PVR May 31 '16 at 18:25
  • @SantoshHegde you should accept the answer if it helps. – PVR Jun 13 '16 at 16:04
  • hello is there anyway to do this with pylucene? – oezlem Sep 14 '16 at 14:42
  • @oezlem PyLucene is a Python extension for accessing Java Lucene, so i think the same logic will be available their also. – PVR Sep 14 '16 at 15:41