-1

I'm writing a query that is supposed to retrieve words from "Words table" if they are contained in sentences in the "Sentences table"

For example: the query should output "hello" if it finds at least one sentence that contains the word "hello"

I was able to write this query so far :

SELECT DISTINCT (words.word) FROM sentences inner join words on sentences.sentence LIKE CONCAT('% ', words.word ,' %')

The issue with this query that it's super slow, like it took 8hours+ and did not output any results given that the words table is around 250k rows and the sentence table is around 1M rows. Can anyone help with a faster solution.

  • 3
    You want your result to be any word that appears in any sentence, and every sentence it appears in, right? That join, unless mysql has some fancy optimiazion, is going to look at every sentence for every word. Using a relational table it might make sense to have a table called sentence_words, and every time a sentence is added to sentences, give it a foreign key, then find the key of the word (or add it), then sentence_words winds up as a many-to-many.... does that make any sense? Others might suggest technologies better suited. Maybe edit your post to say what you're doing with output. – Levin Magruder Jul 15 '22 at 01:47
  • So can can I tweak the query to move to the next word if it finds at least a first match instead of going through all the sentences? – Rakan Selawi Jul 15 '22 at 02:18
  • 1
    Please show your table structure and indexes – Rohit Gupta Jul 15 '22 at 03:39
  • 1
    Please provide enough code so others can better understand or reproduce the problem. – Community Jul 15 '22 at 03:39
  • [How to Optimize Queries in a Database - The Basics](https://stackoverflow.com/q/3191623/3404097) [What are your most common sql optimizations?](https://stackoverflow.com/q/1332778/3404097) [mre] [ask] [Help] [When and why are database joins expensive?](https://stackoverflow.com/q/173726/3404097) [Tips for asking a good SQL question](https://meta.stackoverflow.com/q/271055/3404097) [Asking query performance questions](https://dba.meta.stackexchange.com/q/3034/43932) etc etc – philipxy Jul 15 '22 at 04:43
  • RakanSelawi - With just those two talbes I don't see a way to get a sammple senetence for each word. You'd have to have some kind of ETL code to analyze the sentences and create a third table. Here is someone else asking the same question and I think the answers are good (pretty much what maitreyee_stack said): https://stackoverflow.com/questions/9122752 – Levin Magruder Jul 15 '22 at 09:40

1 Answers1

0

With the information you shared, the data is not extremely large, but the 'sentence' table is going thru full-tables-can for each distinct word in 'words' table.
Also distinct has a processing span involved.
It would be prudent to index the words & sentence table
( probably words on initiating Letter & Sectence on same) & partition them too based on same criteria. \

Then run same query as above.

This approach might reduce how much data being joined per key match.