0

I am having a table containing multiple records with different or similar or partially similar texts.

For example:

record 1 : Stack overflow forum is very useful. This helps developers and researchers a most. record 2 : There are several very useful forums available that helps developers and researchers.

record 3 : This stack overflow forum is very useful. This helps developers and researchers a most. record 4: This text should not be considered.

consider record 1 and record 3, both are same and it is marked as duplicate as i am generating hash code for the records.

record 4 contains totally different text.

Take a look at record 1 and record 2, both resembles mostly similar meaning and contains nearly similar words.

When comparing both records Percentage of similar words is greater in these two records.

So i need to extract these types of records based on the percentage.

Is there any algorithm related to java to perform this?

It will be useful for me if i get some guidance.

Phantômaxx
  • 37,901
  • 21
  • 84
  • 115
SasiRSK
  • 1
  • 1
  • Your actual question is: I need a Java algorithm to calculate string similarity. The other 90% of the question text is irrelevant. And I suggest you google for that first, because asking for resources is off-topic here. – Jan Doggen Mar 20 '15 at 09:20

1 Answers1

0

you can use fuzzy string search for your requirement. May be this post help you out. Or for search in DB you can also use Hibernate search. See Hibernate Querying

Community
  • 1
  • 1
Prabhat
  • 338
  • 4
  • 20