1

I need a MySQL query to return a full sentence from a text column that contains a specified search word.

Currently I am able to get the 20 characters before and after the search word using this query:

select id, MID(body,(LOCATE('search_word', body)-20),40) from content where body like "%search_word%" limit 1

, but that's as far as I've got.

I want to get an entire sentence (between two dots) which contains my search word.

Any ideas? Regex? How do I go about doing this?

jjj
  • 2,594
  • 7
  • 36
  • 57
  • 1
    What if a sentence contains a "dot" within it? Language parsing is a non-trivial matter. – David Nov 15 '15 at 16:54
  • Wouldn't be too concerned about that. As close as it gets is good enough for me. – jjj Nov 15 '15 at 18:55

2 Answers2

0

Why don't you just get the whole field with mysql and filter out the sentence in an actual programming language.

A javascript example would look like this: https://jsfiddle.net/n0wfgjoc/

var text = "Lorem Ipsum is simply dummy text ... versions of Lorem Ipsum.";

var search = "popularised in the";
var pattern = new RegExp('\. ([^.]*' + search + '[^.]*\.)', 'i');

document.getElementsByTagName('body')[0].innerHTML = text.match(pattern)[1];

You should not hve a problem adapting i to your needs - and your language.

It should be much more performant than doing this it in pure SQL.

EDIT:

As @David pointed out, it might be a problem, if there were dots used in the text in other contexts - for abbreviations or dates maybe.

Solving that would be a hard task. My example does not cover that use case.

Stefan Dochow
  • 1,454
  • 1
  • 10
  • 11
  • 1
    I have 120m+ records in the db and many of the articles saved in the txt field are couple of pages long passages. Thus I've only wanted to return relevant sentence from the DB. To do a full text return, then passing each complete article into the code and programatically parsing it, would probably become a huge memory overload. Anyhow, thanks a lot for trying. I guess I should have said that in my example, but I did specifically ask for a MySQL query. – jjj Nov 15 '15 at 18:55
  • Well the amount of records does not change the fact, that SQL is a query language. Too complex operations should not be done in SQL but in a proper programming language that comes with more performant string operators. There is a REGEX oprator in mysql, maybe you want to try that (https://stackoverflow.com/questions/9099469/mysql-select-like-or-regexp-to-match-multiple-words-in-one-record). But still there is the issue @David pointed out: even if you accomplish your task and even if you can live with the poor performance, there might be dots that do not mean the end of a sentence. – Stefan Dochow Nov 15 '15 at 19:49
  • With an amount of data like you describe, I assume, there is no need to evaluate all that live: I would propose selecting the articles via "LIKE %...%", evaluate each in a proper programming language and store already looked up text snippets in a kind of cache table. – Stefan Dochow Nov 15 '15 at 19:52
0

In PostgreSQL you can do this using regexp_matches, and I believe in MySQL this would be REGEXP_SUBSTR, see also: https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr.

WJH
  • 539
  • 5
  • 14