3

Lets say I'm building a rudimentary search engine of sorts. I have a list of strings as the search results, and I want to order the list of search results with the best matching results at the top.

My current code looks like this (named parameters as examples)

import difflib
def order_by_best_match(search_results=["spam", "eggs", "spammy", "eggy"], search_query="spam"):

    for result in search_results:
        ratio = difflib.SequenceMatcher(None, result, search_query).ratio()

I don't know what to do with ratio after that. I know I have to sort the list by ratio, but how would I do that?

chyyran
  • 2,446
  • 2
  • 21
  • 35
  • 2
    Aside: using mutable arguments as default values is a [bad habit](http://stackoverflow.com/questions/1132941/least-astonishment-in-python-the-mutable-default-argument), so it's probably a good idea to avoid it, even here where it doesn't make much difference. – DSM Jul 28 '13 at 00:10
  • Just using that as an example. I wouldn't do this in production code :) – chyyran Jul 30 '13 at 03:33

2 Answers2

13
>>> import difflib
>>> a = ["spam", "eggs", "spammy", "eggy"]
>>> b = 'spam'
>>> sorted(a, key=lambda x: difflib.SequenceMatcher(None, x, b).ratio())
['eggy', 'eggs', 'spammy', 'spam']

Also, if you want the reverse order:

>>> sorted(a, key=lambda x: difflib.SequenceMatcher(None, x, b).ratio(), reverse=True)
['spam', 'spammy', 'eggs', 'eggy']
zhangyangyu
  • 8,520
  • 2
  • 33
  • 43
  • Could you please let me know how to use this method to sort a list of dicts by best match? – coda Mar 17 '16 at 22:07
  • @coda SequenceMatcher can only deal with hashable elements. Since dict is not hashable, you can not use it to directly to sort dicts. How to use it should depend on your usage. For example, use something in the dicts to sort. – zhangyangyu Mar 18 '16 at 05:40
  • Thanks @zhangyangyu! I've tried something like `sorted(result, key=lambda x: difflib.SequenceMatcher(None, x['artist_name'] + x['name'], query).ratio(), reverse=True)` and it's seems to give me (I guess) the desired results. I don't know why it works. Does it seem correct to you? – coda Mar 18 '16 at 08:57
  • @coda It looks OK to me if it meets your need. – zhangyangyu Mar 18 '16 at 09:12
4

The sorted function takes a key parameter, which you can use to determine how things are ranked. A common practice is to build a list of tuples, and then sort based on one element of the tuple.

for result in search_results:
    ratio = difflib.SequenceMatcher(None, result, search_query).ratio()
    weighted_results.append((result, ratio))

print weighted_results
print sorted(weighted_results, key=lambda x: x[1])

gives us

[('spam', 1.0), ('eggs', 0.25), ('spammy', 0.8), ('eggy', 0.0)]
[('eggy', 0.0), ('eggs', 0.25), ('spammy', 0.8), ('spam', 1.0)]
James Polley
  • 7,977
  • 2
  • 29
  • 33