I didn't realize the that Python set function actually separating string into individual characters. I wrote python function for Jaccard and used python intersection method. I passed two sets into this method and before passing the two sets into my jaccard function I use the set function on the setring.
example: assume I have string NEW Fujifilm 16MP 5x Optical Zoom Point and Shoot CAMERA 2 7 screen.jpg
i would call set(NEW Fujifilm 16MP 5x Optical Zoom Point and Shoot CAMERA 2 7 screen.jpg)
which will separate string into characters. So when I send it to jaccard function intersection actually look character intersection instead of word to word intersection. How can I do word to word intersection.
#implementing jaccard
def jaccard(a, b):
c = a.intersection(b)
return float(len(c)) / (len(a) + len(b) - len(c))
if I don't call set
function on my string NEW Fujifilm 16MP 5x Optical Zoom Point and Shoot CAMERA 2 7 screen.jpg
I get the following error:
c = a.intersection(b)
AttributeError: 'str' object has no attribute 'intersection'
Instead of character to character intersection I want to do word to word intersection and get the jaccard similarity.