0
select substring_index(SUBSTRING_INDEX(title, ' ', title+1), ' ',-1) as word ,
       COUNT(*) AS counter
from feed_collections
group by word
ORDER BY counter DESC;

The table has 1785123 rows and I thing this is the problem.

This is the error query (1690): BIGINT value is out of range in '(feed_collections.title + 1)' and I don't know how to fix it.
The query worked until around 1500000 rows.

The table contains 3 columns: title(text), url(text), date(datetime).
The code is finding most common words in column title Example: Table

+----------------------------------+-----------------+
| title                            | url             |
+----------------------------------+-----------------+
| the world of ukraine             |          www.ab |
| count the days until christmas   |    www.abc.com  |
| EU and NATO wants to use bombs   |    www.abcd.com |
| Ukraine needs help from NATO     |    www.abce.com |
+----------------------------------+-----------------+

Result

+------+-------+
| word | total |
+------+-------+
| nato |    5  |
| of   |    14 |
| and  |    11 |
| To   |     9 |
| that |     7 |
| ukraine    |     2 |
| EU   |     1 |
+------+-------+

I adapted the code from here: How to find most popular word occurrences in MySQL?

The proof that query works This works with small data. Seems to be a problem when tries to filter large data.

What I'm trying to achive in the future is to find most common words in the title column grouped by 1,2,3,4,5,6,7 words.
It will exists a select box to select how many words to use.
Example:
I will select to find most common words with 4 words.
Title: 1. Nato is using force , 2. Eu and Nato is using force.
Results with 4 words:
'nato is using force' found 2 times in title.

Any idea how to fix or how to do a query for this? I'm working with laravel, a solution would be to create a php method...

Sergiu
  • 61
  • 8
  • 2
    *`.. SUBSTRING_INDEX(title, ' ', title+1) ..`* - what is `title`??? it is mentioned in 1st and 3rd parameters which looks like an error. – Akina Mar 10 '22 at 19:23
  • 2
    your query doesn't make any sense, provide data for title – nbk Mar 10 '22 at 19:23
  • what does `select version();` show? – ysth Mar 10 '22 at 19:29
  • 1
    using title+1 as the third parameter to substring_index makes no sense at all. show some sample data – ysth Mar 10 '22 at 19:30
  • 3
    I don't think your query was ever doing what you expected, if you wanted to count each word in each title separately. you are getting the error now because some title starts with a long number, not because of the number of rows. – ysth Mar 10 '22 at 19:34
  • You need to use a user defined function or recursive cte to count the number of words. – Salman A Mar 10 '22 at 19:53
  • What is the arithmetic "title+1" supposed to mean?? Is `title` a numeric column? – Rick James Mar 11 '22 at 01:27
  • @SalmanA a user defined function isn't useful for this that I know of. can you expand on that? – ysth Mar 11 '22 at 01:59
  • @ysth use a while loop over a string, use instr to find the position of space starting from previous occurence until no more are found. – Salman A Mar 11 '22 at 02:45
  • I added new details in the ticket to be more clear. – Sergiu Mar 11 '22 at 06:12
  • @SalmanA but that doesn't take a title "foo bar" and produce two rows "foo" and "bar" – ysth Mar 11 '22 at 18:00
  • @ysth a combination of stored procedure, cursors, temporary tables. – Salman A Mar 11 '22 at 20:01
  • go back to the question you adapted from: nothing there does anything like your passing title+1 which makes no sense; instead they pass a number indicating which word to extract from the title and do a join to make that number go over a range. with your example data, your query only gets the first word: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=50723ddfd9ab11a122fca871b8acfefe – ysth Mar 11 '22 at 20:30
  • Again, your error is not because of too much data, it is because you are passing the wrong thing as the third parameter to substring_index, and when title happens to start with a large number, it will give an error (as well as never doing what you actually want, regardless of the data) – ysth Mar 11 '22 at 20:30
  • also again, what does `select version();` show? – ysth Mar 11 '22 at 20:31
  • @ysth, version: 8.0.28-0 ubuntu0.20.04.3 It makes sense what you say about the large number, it possible to have in db that kind of data. "(as well as never doing what you actually want, regardless of the data)" .. I know, but I wanted to start from something, although I need to be able to specify how many words are the most searched phrases without specifying those words. :) But still, I don't have a clue how to fix or remake a query to do what I want, that's why I wrote the post. If you have an idea how to achive that, please help me :)) – Sergiu Mar 12 '22 at 05:51
  • I found this in the title where the query is give me error: "8967E32C-2E39-4630-B58B-DFC02DDC94C6" Like you said, seems to be considered a large number – Sergiu Mar 12 '22 at 06:23

0 Answers0