I have a mssql db of having min 39gb of finger print data , now required to find out the duplicates within it , each finger print record has a minimal (reduced here) structure as follows
[EMP ID] [finger print IMAGE] [finger print TEMPLATE (ISO)]
I’m using (1 to 1 Comparison ) a C# program and algorithm that is based on Ratha's algorithm on ISO TEMPLATE .The algorithm is working and is able to detect duplicates but the problem is the time that is required for an 1-to-one comparison The cost is of O(n2) , can anyone help me in giving any idea regarding the reduction of time cost on a finger print matching algorithm.
I read about “ms sql ssis” but its for ETL I have to apply the algorithm here that cant be done with “ms sql ssis”
Now the sample benchmark is as follows (approx)
SampleSpace Compared Time
1. 100 100 ~ 53 sec
2. 500 500 ~ 3.50 min
3. 1233 1233 ~1 hr 48 min
I found other ways for categorized feature extractions , but how can I categorize based on ISO TEMPLATE. Can any one give an advice ?
I think Hadoop is an idea , but any one came across a fingerprint matching integration with Hadoop