I need to write a sql query that operates somewhat like a fuzzy lookup where it will complete a join if a sentence within a paragraph matches a sentence in another paragraph in a different table. I’m trying to identify similarities between two datasets that aren’t written exactly the same. What’s the best plan of attack here?
I want to be able to join based on one sentence matching somewhere within the text and then analyze the results.
I tried parsing to just compare the unique identifiers and this works in some cases. However a lot of the data is messy and not written in a consistent structure.
SELECT *
FROM table1 t1
JOIN table2 t2 ON t2.para LIKE CONCAT('%', t1.para, '%')
I tried this code and didn’t get the result I wanted.