3

I want to implement an algorithm in Java to find the nearest similar Strings.

I have station_names in mysql database like - 23 ST, 233 ST, 21 ST, 14 St Times Sq, 24 ST

and if user enters a search string like 23rd station then I should return 23 ST and 233 ST or if user enters like Times Square then result should be 14 St Times Sq.

I found many algorithms on internet but I m confused to which one to use.

Can you please suggest me the best algorithm that can I implement in Java?

Thanks in advance

Deepu
  • 2,590
  • 19
  • 52
  • 74
  • 1
    *"Can you please suggest me the best algorithm"* I'd typically go for the one with polka-dots, since it is prettier. Of course, your definition of 'better' might not include the visual effect, so why don't you tell us what *you* mean by better? – Andrew Thompson Dec 26 '12 at 12:40
  • Thanks Andrew for ur reply, best algorithm means that will result most similar strings that user want to search, e.g. for 23 ST user can give search strings like 23rd Station / 23 Station / 23rd St ect – Deepu Dec 26 '12 at 12:45
  • http://en.wikipedia.org/wiki/String_searching_algorithm talks about some of the popular algorithm but you need to implement them in Java – AurA Dec 26 '12 at 13:02
  • Might help. not sure. http://stackoverflow.com/q/2891514/1135954 – mtk Dec 26 '12 at 13:02
  • Thank you @AurA and mtk for your suggestions – Deepu Dec 26 '12 at 13:17

2 Answers2

2

To answer your question, there is no best algorithm in general, only the one that works best in your specific case.

You will want to define one or more metrics to measure differences between the input and the strings you have in the DB and then sort the results by score (see String metric).

The problem is that the most similar string is not always the closest address. That's why I said you have to define your own metric.

Sandi Hrvić
  • 211
  • 1
  • 1
1

There are many possible ways to do this. For example you might say that 21 ST is closer to 23rd station than 233 ST. You have to work out what you want and find the approach which will match it best.

It is likely you might need multiple approaches and then score the results. This is what I would do.

You can test the different approach by providing a large sample data test suite and finding out which of the approaches (or a combination) give you the highest success rate.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Thanks Peter for your answer, I want to return the most similar strings that user want to search, for e.g. for **23 ST**(actual station name) user can enter search strings - **23rd Station / 23 Station / 23rd St** – Deepu Dec 26 '12 at 12:47
  • Can you define "most similar"? While it is something most people have an idea of, for a computer you need to define it formally. – Peter Lawrey Dec 26 '12 at 13:10