Imagine that I have a list of tokens:
tokens_to_search = [
'fox.com',
'australia',
'messi',
'ronaldo',
'British premier league'
]
And I have a sentence, which may include some words, relevant to the tokens_to_search
content:
sentence = 'Messi scored a goal in the premier league, watch on the Fox News'
The sentence can be split into tokens:
tokens_from_sentence = [
'messi',
...,
'premier',
'league',
...,
'fox',
'news'
]
How can I detect the words from the tokens_to_search
into the tokens_from_sentence
with some fuzzy search? So the result will be
[
'fox.com',
'messi',
'British premier league'
]
The simple approach is to do a nested loop by calculating some token distance, but it's O(N*M)
. Maybe there's a smart way to do this?
Thanks in advance!