I am building an android app for registering user complaints for any specific government related issue. I would like to make the complaints unique without any ambiguity in my database. I am using php and mysql database. I would like to match the similarities between the complaints using a software like wordnet in order to eliminate identical complaints and give suggestion edits to user. So how can I do this? Is there is only wordnet or any other reliable method?
-
1Use a string distance algorithm to compute how far away potential new entries are from existing ones. Start here: https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance – JLB Mar 22 '16 at 16:29
-
Does this algorithm predict the synonyms between two strings? But it would be more precise if I have a tool for it so that I can finish my work soon. It seems to take long time to implement it. – vicky Mar 22 '16 at 16:47
-
A discussion about what you are trying to do: http://stackoverflow.com/questions/12094326/match-similar-variations-of-words-suffixes-in-mysql – JLB Mar 22 '16 at 17:06
1 Answers
Recommendations (the kind you are asking for, not the kind you are building) are not allowed on Stack Overflow, and I expect this question to be closed. Hopefully I can finish this before that happens.
While single case stuff and ambiguity seem like things you want to get rid of, there is a lot of value in recording everything people say in the way they say it. This is true even here on Stack Overflow, questions might be closed as duplicates, but we don't remove them, we keep them around so that if someone types the question that way and lands here from google, we still capture their understanding of the problem.
The problem you're faced with is more one of product design than it is algorithm. Regardless of the matching algorithm you chose to determine similarity (of which there are many), you still have decide what the effect of declaring things similar enough has on your users. That will probably guide you decision on how to determine similarity. (i.e. is it word similarity, character n-gram similarity, conceptual similarity, etc)
Once you choose a similarity and run into a specific problem getting the effect you want (meaning you can describe your inputs and what you expect as output), that's the kind of question you can ask here.

- 18,631
- 6
- 67
- 96