I have some list of strings, for example:
["foo bar SOME baz TEXT bob",
"SOME foo bar baz bob TEXT",
"SOME foo TEXT",
"foo bar SOME TEXT baz",
"SOME TEXT"]
I want it to be sorted by exactness to SOME TEXT
substring (upper case doesn't matter). Something like this order:
["SOME TEXT",
"foo bar SOME TEXT baz",
"SOME foo TEXT",
"foo bar SOME baz TEXT bob",
"SOME foo bar baz bob TEXT"]
The idea is - the best score gets the string with the best match to substring words position. And for bigger amount of "sloppy" words between substring's words - the lower ordering it gets.
I have found some libraries like fuzzyset, or Levenshtein distance but I'm not sure this is what I need. I know the exact substring by what I want to sort and those libs search the similar words, as I understood.
Actually I need to do this sort after some database query (Postgresql) in my Django project. I have already tried full-text search with its ORM, but didn't get this relevant sort order (it doesn't count the distance between substring words). Next I have tried Haystack+Whoosh, but also at this moment didn't find info how to do this sort there. So idea now is to get query set and next sort it out of the database (yep, I know that might be a bad decision, but for now I want it just work). But if anybody tells me how to do this within any of technologies, I have mentioned here - that will be also super cool. Thank you!
p.s. The length of substring supposed to be 2-10 words in max 20 word string.