consider the following Strings:
- he llo
- goodbye
- hello
- = (goodbye)
- (he)(llo)
- good bye
- helium
I'm trying to sort these in such a way that similar words comes together, I know
alphanumerical sorting
is not an option- removing special chars
",-_ and etc
then comparing is certainly helpful but results won't be as good as I hope for.
NOTE :
there might be few different desired ouput for this, one of which is :
DESIRED OUTPUT:
- hello
- he llo
- (he)(llo)
- helium
- goodbye
- good bye
- = (goodbye)
so my question is that if there is a java package that compares strings and ultimately sort them based on it .
I've heard of terms such as n-gram
and skip-gram
but didn't quite understand them. I'm not even sure if they can be useful for me at all.
UPDATE: finding similarities is certainly part of my question but the main problem is the sorting part.