I have a data frame like this,
df
col1 col2
A 'the value is zero'
B 'this is a cat'
C 'the value is one'
D 'nothing is here'
E 'the colour is blue'
F 'this is dog'
G 'empty sequence'
H 'the colour is red'
I 'the colour is green' 1
Now I want the similar kind of strings as flagged as 1 and others as zero, so the final data frame should look like,
col1 col2 col1
A 'the value is zero' 1
B 'this is a cat' 1
C 'the value is one' 1
D 'nothing is here' 0
E 'the colour is blue' 1
F 'this is dog' 1
G 'empty sequence' 0
H 'the colour is red' 1
I 'the colour is green' 1
The 0 and 1 can be obtained using SequenceMatcher(SequenceMatcher(None, s1, s2).ratio()) function and with some threshold value we can make it to zero or one.
But if I use for loops to find the similarity between each other then it will take longer time to execute. Looking for some pandas shortcuts/pythonic way to do this efficiently.