2

Let's say I have a random text, such as this one:

Lorem ipsum dolor sit amet consectetur adipiscing elit.

Using MySQL, I'd like to search for a given keyword and get a return along with immediate words surrounding my keyword.

Example:

Select * from table where keyword like "%amet%"

This returns entire text. But I need a query which would return only following text:

sit amet consectetur

So, the word I've search for, plus previous and next word.

--

Additionally, In case I search for keyword elit, return should be (as there is no word after 'elit'):

adipiscing elit

In case I search for keyword Lorem, return should be (as there is no word before 'Lorem'):

Lorem ipsum

Can this be somehow done? Using regex or detection of spaces, using MID or some sort of a substring?

jjj
  • 2,594
  • 7
  • 36
  • 57
  • 2
    My initial thought would be to do this in the calling language rather than mysql if possible. Could be straightforward to to get the adjacent words you want once you have the row out of the database. – dwxw Oct 02 '13 at 15:43
  • I`m not sure but it might be possible with full text search. – Mihai Oct 02 '13 at 15:44
  • I don't want to get entire string, just a small part of it. Reason being is that the text in my database is simply huge... – jjj Oct 02 '13 at 15:46
  • BTW. This article discusses Regex for such purposes, I just don't know how would I enter it as a MySQL query... http://stackoverflow.com/questions/8250987/extract-x-number-of-words-surrounding-a-given-search-string-within-a-string – jjj Oct 02 '13 at 15:47
  • I am trying something like this: SELECT * FROM text WHERE old_text REGEXP '((?:\w+\s*){0,5}Lorem(?:\s*\w+){0,5})' limit 1 ; but it doesn't want to work, referring to some issue with sql command. – jjj Oct 02 '13 at 15:53

2 Answers2

1

I've had a play and I can give you half a solution in pure mysql.

You can get the string either side of the word you're after using this. Just don't know how to get the word rather than the whole substring. Hopefully it's helpful.

select case when (select w.t regexp concat('[[:<:]]', w.v)) = 1 
    then substr(w.t, 1, locate(w.v, w.t)-1) else null end as 'left_word',
       w.v as word,
       case when (select w.t regexp concat(w.v, '[[:>:]]')) = 1 
    then substr(w.t, locate(w.v, w.t)+length(w.v)) else null end as 'right_word'
    from (
        select "Lorem ipsum dolor sit amet consectetur adipiscing elit." as t, "amet" as v
    ) as w;

select case when (select w.t regexp concat('[[:<:]]', w.v)) = 1 
    then substr(w.t, 1, locate(w.v, w.t)-1) else null end as 'left_word',
       w.v as word,
       case when (select w.t regexp concat(w.v, '[[:>:]]')) = 1 
    then substr(w.t, locate(w.v, w.t)+length(w.v)) else null end as 'right_word'
    from (
        select "Lorem ipsum dolor sit amet consectetur adipiscing elit." as t, "elit." as v
    ) as w;
dwxw
  • 1,089
  • 1
  • 10
  • 17
  • That is great. Can you use this Regex I've implemented for this case: http://regexr.com?36irf – jjj Oct 02 '13 at 16:33
  • I like that regex because it allows me to specify how many surrounding keywords I want to extract. This is the regex: ((?:[\w"',.-]+\s*){0,1}["',.-]?amet["',.-]?(?:\s*[\w"',.-]+){0,1}) – jjj Oct 02 '13 at 16:34
  • I think you're not going to win because you really need capture groups to get the words you're looking for. Mysql [doesn't appear](http://stackoverflow.com/questions/6617996/simulating-regex-capture-groups-in-mysql) to support this, it only tells you if the string matches or not. You don't know how long the words are either side of your search term so you can't get the substring. As I said in my first comment, this would be far easier in a 'real' programming language. ;-/ – dwxw Oct 03 '13 at 08:45
  • Too bad, I didn't know MySQL doesn't support regex. Regex I've mentioned above would take care of extracting words. It works nicely, except I don't know how to use it with MySQL. – jjj Oct 03 '13 at 12:25
0

This returns the text but doesn't really care about words per se.

select  result.title,
        concat('... ',mid(title,greatest(1,position("Farther" in result.title)-100),least(length("Farther")+200,length(title))),' ...') as 
preview,
        result.score
from
        (select
                title,
                match(title) against('farther' in boolean mode) as score
        from articles
        where match(title) against('farther' in boolean mode)
        order by score desc) resultselect  result.title,
        concat('... ',mid(title,greatest(1,position("Farther" in result.title)-100),least(length("Farther")+200,length(title))),' ...') as 
preview,
        result.score
from
        (select
                title,
                match(title) against('farther' in boolean mode) as score
        from articles
        where match(title) against('farther' in boolean mode)


+--------------------------------------+-------------------------------------+-------------------+
| title                                | preview                                      | score             |
+--------------------------------------+-------------------------------------+-------------------+
| Morality Sees Farther Than Intellect | ... Morality Sees Farther Than Intellect ... | 16.48005485534668 |
+--------------------------------------+-------------------------------------+-------------------+
1 row in set (0.002 sec)

Don't think it works with multiple matches, just the first one.