I have a program in C# whose psuedocode looks like this. The program runs fine except it takes 5 days to run. I need to run this every day for new criteria. We are trying to tag every flower name that is found in Biology books. Each day it is a different textbook or journal in .txt format. We are using Lucene search to make it faster. We have List of FLowerFamilyID in the database. We also have FlowerID| FlowerCommon Name in a csv file. This csv file has about 10,000 entries.
Step 1// Get FlowerFamilyID from SQL server & dump in a text file. This has about 100 entries
FlowerFamilyID FamilyName
1 Acanthaceae
2 Agavaceae
Step 2// A csvfile has flowerid, flowercommonname. Read csv file (about 10000 entries) and store them in a list eg:
1|Rose
2|American water willow
3|false aloe
Step 3// Lucene index is created on Book/Journal for that day
Step 4// for every familyflowerID from datatextfile call SearchFlower(flowerfamilyID, flower list). returns all flowers found in that book/journal
Step 5// In search function I call Lucene query parser & search for 10000 flower entries, if found store the first hit with score in a list
public static List<flowerResult> searchText(String flowerfamilyid, List<flower> flowers)
{
DateTime startdate = DateTime.Now;
List<flowerResult> results = new List<flowerResult>();
Document doc = new Document();
foreach (var flower in flowers)
{
string[] separators = { ",", ".", "!", "?", ";", ":", " " };
string value = flower.getFlower().Trim().ToLower();
string[] words = value.Split(separators, StringSplitOptions.RemoveEmptyEntries);
String criteria = string.Empty;
if (words.Length > 1)
criteria = "\"" + value+ "\"";
else
criteria = value;
if (string.IsNullOrEmpty(criteria))
continue;
criteria = criteria.Replace("\r", " ");
criteria = criteria.Replace("\n", " ");
QueryParser queryParser = new QueryParser(VERSION, "body", analyzer);
string special = " +body:" + criteria;
Query query = queryParser.Parse(special);
try
{
IndexReader reader = IndexReader.Open(luceneIndexDirectory, true);
Searcher indexSearch = new IndexSearcher(reader);
TopDocs hits = indexSearch.Search(query, 1);
if (hits.TotalHits > 0)
{
float score = hits.ScoreDocs[0].Score;
if (score > MINSCORE)
{
flowerResult result = new flowerResult(flower.getId(), flower.getFlower(), score);
results.Add(result);
}
}
indexSearch.Dispose();
reader.Dispose();
indexWriter.Dispose();
}
catch (ParseException e)
{//"Could not parse article. Details: " + e.Message);
}
}
return results;
}
public class flower
{
public long flowerID {get;set;}
public string familyname {get;set;}
public string flower {get;set;} //common name
}
I tried running this, & it completed in 5 days. But I need to finish this within a day bcoz results are used for further analysis. So, I split up the csv file into 10 different files and the job completed in 2 days. I was told by team leader to use multiple threads to enhance the speed. I have no clue how to do that. Can somebody help me?
Thanks R