1

I have a dictionary with near around a million words. I have to design algorithm for quick search of sequence of characters.

For ex. if user types and the app must return words having the sequence like random,sand,stand ...etc.

The existing solution I have is to search for matching regex in all the existing words which is not efficient. I am open to restructure existing database, caching of dictionary or work at any level if required Or is there some ready made api in java?

Ajay George
  • 11,759
  • 1
  • 40
  • 48
Sankalp
  • 2,030
  • 7
  • 30
  • 41

2 Answers2

3

http://lucene.apache.org/core/

Look at this, this should cater to your requirements.

final File INDEX_DIR = new File("index");  
try{  
    Class.forName("com.mysql.jdbc.Driver").newInstance();  
    Connection conn = DriverManager.getConnection("jdbc:mysql://localhost/test", "root", "password");  
    StandardAnalyzer analyzer = new StandardAnalyzer();  
    IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true);  
    System.out.println("Indexing to directory '" + INDEX_DIR + "'...");  
    indexDocs(writer, conn);  
    writer.optimize();  
    writer.close();  
} catch (Exception e) {  
    e.printStackTrace();  
}  

void indexDocs(IndexWriter writer, Connection conn) throws Exception {  
String sql = "select id, name, color from pet";  
Statement stmt = conn.createStatement();  
ResultSet rs = stmt.executeQuery(sql);  
while (rs.next()) {  
    Document d = new Document();  
    d.add(new Field("id", rs.getString("id"), Field.Store.YES, Field.Index.NO));  
    d.add(new Field("name", rs.getString("name"), Field.Store.NO,  Field.Index.TOKENIZED));  
    d.add(new Field("address", rs.getString("address"),Field.Store.NO, Field.Index.TOKENIZED));  
    writer.addDocument(d);  
  }  
}  
Himanshu Bhardwaj
  • 4,038
  • 3
  • 17
  • 36
  • This directs for having lucene set up , any suggestion of design of db or classes. – Sankalp Apr 05 '13 at 11:49
  • Well design of DB is upto you, what you want to index, should be a flattened structure. For building the index, its like, reading the resultset and adding to Index. Say you want to add three columns for a record to index, you can do something like: – Himanshu Bhardwaj Apr 05 '13 at 11:59
1

I'd try using a trie (Where do I find a standard Trie based map implementation in Java?). Using an in memory lucene index may also fit the bill, depending on your requirements

Community
  • 1
  • 1
qwerty
  • 3,801
  • 2
  • 28
  • 43