1

I have about 1000 rows in the database. For my purpose I need to find the most similar from DB.

For example the DB has rows:

England
Spain
English
French
France
Turkey
Ukrainian
.....
Scotland

and so on.

The source string I have, for example, English's(it's may be anything string, BUT about 95% it hasn't identical string in database, only similar.)

I have some algorithms that can find the most similar. It's based on one of these Wikipedia articles.

How can I find the similar among this 1000 rows? Go through all rows - it's very long? Maybe create index? Index of what items?

I need fresh thoughts.

UPDATE:

I didn't specify that it's the often-performed operations. It's perfomed in a loop of 50 and more items. And it's web app and I can't every time to access db. At first Time I select all rows, cache it in memory, and for next iterations I use data from cache. So I need do it in code(C#.NET)

melnynet
  • 69
  • 1
  • 8
  • Personally, I would write some code and see whether it fits my performance needs. If yes, I would be done, if not, I would try another algorithm and/or fire up some Performance Profiler tool. – Uwe Keim Dec 06 '14 at 23:21
  • 1
    Perhaps you could use Levenshtein. http://stackoverflow.com/questions/27293285/check-if-similar-value-exists-in-database/27293721#27293721 – Tim Schmelter Dec 06 '14 at 23:22
  • 1
    @TimSchmelter OP's page already lists it among the algos. – Sergey Kalinichenko Dec 06 '14 at 23:24
  • 1
    @dasblinkenlight: "It's based on one of these" is not very specific. I wanted to show a working sample on a similar question. Maybe using a .NET version is more efficient. http://levenshtein.blogspot.de/2011/04/how-it-is-done.html – Tim Schmelter Dec 06 '14 at 23:27
  • 1
    @LB . . . I disagree that this is a duplicate question, but I can't post my answer because it was closed. It seems inappropriate, but I put the answer on the other question. – Gordon Linoff Dec 06 '14 at 23:29
  • Since you mention a DB, if it's sql server, couldn't you use DIFFERENCE to get the SOUNDEX difference between two strings? http://msdn.microsoft.com/en-us/library/ms188753.aspx You wouldn't even need to fetch or iterate, just query for the closest to your input. – Pablo Romeo Dec 06 '14 at 23:43

0 Answers0