0

I have a table to store book titles. Whenever I insert a new title I would like to check, if the title already exists. The problem is that, due to typos, an exact match is not possible.

For example

'My Dream of Wonder Land'
'My Deam of Wonder Land'

So my goal is to create Tags for each title, with every book having at most 3 tags. These tags are the 3 longest words of the title:

'My Dream of Wonder Land' = dream, wonder, land
'My Deam of Wonder Land'= deam, wonder, land

Now when I add a title, I would like to find all titles, with at least two matching tags. The order of tags should be ignored, meaning that

'land, dream, lego'

Should match as well.

What would be the best way to store the data in MySQL?

Anh Tuan Nguyen
  • 971
  • 3
  • 13
  • 28
  • You could try a fuzzy search to see if the book title exists. http://stackoverflow.com/questions/369755/how-do-i-do-a-fuzzy-match-of-company-names-in-mysql-with-php-for-auto-complete – fqhv Apr 21 '17 at 16:38
  • 1
    The problem is that you can have books with the same title. Matching should be done on ISBN numbers or something similar, not on title. – Shadow Apr 21 '17 at 17:01
  • @fqhv thanks, soundex sounds very promising. But if the order of words change it does not work. – Anh Tuan Nguyen Apr 21 '17 at 17:07

1 Answers1

0

I would recommend using fuzzy matching, as these comparisons can be very complex.

That said, your idea of using tags sounds much more fun.

I would probably have a table just for tags and store them vertically.

CREATE TABLE BookTag (
    BookId INT,
    Tag NVARCHAR(50))

Then when inserting books you would first calculate the tags and store them in a temp table. I would write a function that gets the nth largest word from a string. For this example I'm just going to assume this function exists and is called nthLargestWord(VARCHAR expression, INT n).

INSERT INTO TEMPORARY NewBookTag (Tag)
SELECT nthLargestWord(@booktitle, 1)
UNION
SELECT nthLargestWord(@booktitle, 2)
UNION
SELECT nthLargestWord(@booktitle, 3)

You can now join this table to your existing tag's table to see if any have 2 in common.

SELECT e.BookId
FROM NewBookTag n
INNER JOIN BookTag e ON n.Tag = e.Tag
GROUP BY e.BookId
HAVING COUNT(*) >= 2

You could just put this in an IF EXISTS an insert when it is false.

Also after inserting your book, you already have the tags prepared to insert as well!

INSERT INTO Book (BookTitle)
VALUES (@BookTitle)

INSERT INTO BookTag (BookId, Tag)
SELECT b.BookId, t.Tag
FROM Book b
INNER JOIN NewBookTag t ON b.Title = @booktitle

This solution allows you to insert one Book at a time, it could be refactored to allow for multiple if need be.

Community
  • 1
  • 1
fqhv
  • 1,191
  • 1
  • 13
  • 25