0

Working on a MySQL db which contains the Book Title, Author, Publication in a single column(field). The original db designer is said to put all those in single field to allow for easy search. Age-old client database, so I am NOT allowed to modify it.

Example:

Id BookDetails
1  A Tale of Two Cities|Charles Dickens|Penguin
2  And Then There Were None,Agatha Christie Random House
3  Adventures of Sherlock Holmes Arthur Conan Doyle-Harper Collins
4  Integrated Physics, Sarah Ann Barnes & Nobles
5  Integrated Electronics Millman & Halkias McGraw
6  Integrated Electronics|Millman|McGraw

As one can see from above, the book name, author name, pub names are all jumbled up - some fields they are pipe delimited, somewhere hyphen-delimited, & somewhere no or space delimited.

Current search allows for only 1 word in search query (like "Integrated" which will return row 4, 5 & 6). Among other requirements, I am tasked with extending the search to allow multi-words (Integrated Electronics Halkias - 3 words).

The implementation challenge I am facing is which SQL query will be best?

  1. If I go with BookDetails like '%Integrated%Electronics%Halkias%', it will match 5 perfectly, but will miss 6. But if someone changes to search query order to say 'Halkias Integrated Electronics' - it will fail to match anything
  2. If I go with breaking it into single word AND (like BookDetails like '%Integrated%' and BookDetails like '%Electronic%' and BookDetails like '%Halkias%') will only match 5 (while ideally it should match 6 as well).
  3. If I go with breaking it into single word AND (like BookDetails like '%Integrated%' OR BookDetails like '%Electronic%' OR BookDetails like '%Halkias%') will match 4, 5 & 6 (while ideally it should only match 5 & 6 only).
  4. Any other suggestions

I've checked existing SO threads (like How do I do a fuzzy match of company names in MYSQL with PHP for auto-complete? - doesnt fit books need) and (Searching for names in a MySQL database that probably has typos - talks about pronunciations/typos which still does not fit books).

Any inputs on possible implementations please

levent001
  • 174
  • 7
  • 2
    You obviously know this, but concatenating multiple discrete values in a single column accomplishes exactly the opposite of allowing for easy [anything]. a DBMS works efficiently with normalised data, which this is not; I would seriously consider normalising the data, otherwise you will *always* be fighting it and paying the price with poor performance and concurrency. – Stu Apr 23 '23 at 08:39
  • 1
    "it will fail to match 6, but will match 4 (wrong!)." -> ?? It will only match `5`, see: [DBFIDDLE](https://dbfiddle.uk/uz504XkK) – Luuk Apr 23 '23 at 08:45
  • It's far from clear what your exact requirements and expected results are, perhaps add a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) to your question. – Stu Apr 23 '23 at 08:56
  • @Luuk - yeah. It was a typo. Have edited the q to correct it. Am trying your answer as well. Please allow some time (will delete this comment later) – levent001 Apr 23 '23 at 11:09

2 Answers2

1

Adding a score might help, to give the better results a higher place in the results:

SELECT 
  id,
  BookDetails,
  CASE WHEN BookDetails like '%Integrated%'  THEN 1 ELSE 0 END +
  CASE WHEN BookDetails like '%Electronic%'  THEN 1 ELSE 0 END +
  CASE WHEN BookDetails like '%Halkias%'  THEN 1 ELSE 0 END as Score
FROM books
WHERE BookDetails like '%Integrated%' 
   or BookDetails like '%Electronic%' 
   or BookDetails like '%Halkias%'
ORDER BY 3 DESC;

output:

id BookDetails Score
5 Integrated Electronics Millman & Halkias McGraw 3
6 Integrated Electronics|Millman|McGraw 2
4 Integrated Physics, Sarah Ann Barnes & Nobles 1

Or, adding a FULLTEXT INDEX (but that needs a change in your database).

ALTER TABLE books ADD FULLTEXT(BookDetails);

SELECT 
  id,
  BookDetails,
  MATCH (BookDetails)
     AGAINST ('Integrated Electronic Halkias' IN NATURAL LANGUAGE MODE) as Score
FROM books
WHERE  MATCH (BookDetails)
     AGAINST ('Integrated Electronic Halkias' IN NATURAL LANGUAGE MODE)
ORDER BY 3 DESC;

see: DBFIDDLE

Luuk
  • 12,245
  • 5
  • 22
  • 33
1

Two Things here:

  1. Since you can't change the DB entries or design, you are left with the only option to play around with your SQL Query PLUS the way you take the inputs from your user.

  2. I am guessing that order of words in the DB may also be random - like there can be entry like "Integrated Electronics Millman & Halkias McGraw", and same book can also be as "Millman Halkias McGraw Integrated Electronics" in your DB. Reason for mentioning is that you will have to keep your queries focused on SINGLE WORDS only, instead of combining them in any particular order. (Same for inputs from user side - they might put words in any order!)

In that context, the following might help:

When taking more than one word as inputs (like via a HTML form), try to take those in TWO different fields which are clearly classified to the user as "MUST Contain Words" and "MAY Optionally Contain Words"

In above example, user can give "Integrated Electronic" in MUST CONTAIN words field, and give "Halkias" as MAY CONTAIN words field.

Then in your SQL Query, you can put a AND on all the words in the MUST field, and some combo of OR in the MAY field.

where ((bookdet like '%Integrated%') AND (bookdet like '%Electronic%'))

& rank the rows higher which also have "Halkias" in the field (using order by).

With no changes possible on DB side, it will depend on the way you collect the details from the user & on the way you create your query.

Aquaholic
  • 863
  • 9
  • 25