33

Need help to form the MYSQL query from table one column having the bellow content

Row1 : this is first <a href='mytext.txt'>row</a> from the table

Row 2 : THis is the second row <img src ='mytext.jpg'> my image is there

Row 3 : <p>This is the Third row my mytext is there </p>

Row 4 : <p class='te_mytext'>This is the Third row my text is there </p>

this is the table rows i try to search the keyword as 'mytext'

my query is

SELECT * from table  WHERE colmn_name ` like '%mytext%' "

I will get all the 4 rows as result but the result is wrong. I need to get the correct output as only Row 3. The reason this row only having the mytext inside the content all other are not in content but mytext having in all rows

How can I write the MySQL query?

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Subha
  • 741
  • 2
  • 9
  • 23

10 Answers10

53

try this solution: not tried it myself but apparently it works.

source: http://forums.mysql.com/read.php?52,177343,177985#msg-177985

   SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS fnStripTags;
DELIMITER |
CREATE FUNCTION fnStripTags( Dirty varchar(4000) )
RETURNS varchar(4000)
DETERMINISTIC 
BEGIN
  DECLARE iStart, iEnd, iLength int;
    WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO
      BEGIN
        SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));
        SET iLength = ( iEnd - iStart) + 1;
        IF iLength > 0 THEN
          BEGIN
            SET Dirty = Insert( Dirty, iStart, iLength, '');
          END;
        END IF;
      END;
    END WHILE;
    RETURN Dirty;
END;
|
DELIMITER ;
SELECT fnStripTags('this <html>is <b>a test</b>, nothing more</html>');
KB.
  • 3,549
  • 4
  • 23
  • 29
  • or you could just have separate column/field with pre-stripped content and then use that to search for your keywords - of course that would duplicate loads of data. – KB. Jun 07 '10 at 14:47
  • 2
    If using this function, remember to increase the Dirty varchar to a high value, I choose 1000000 to be on the safe side for my project. – milan Dec 14 '11 at 00:40
  • For running this under phpmyadmin I have to remove first line because of permissions (or sth. like that) – jmarceli Sep 12 '13 at 02:14
  • this won't work in cases where you have `<` or `>` as part of your data. For example `SELECT fnStripTags('10 is > 9 and < 11, nothing more') AS clean;` returns `10 is > 9 and`. – ronan_mac Feb 24 '15 at 11:42
  • 2
    Remember that you may need to set a character set for a return value, otherwise the MySQL will cast a string to its default charset and you may loose some data. RETURNS varchar(4000) CHARSET utf8 – MakG Feb 27 '15 at 08:31
  • With regards to the comment by ronan_mac, I would imagine this wouldn't come up as a problem, as if you are storing HTML, then a > or < in the actual text should be stored as its HTML entity value (e.g. >). – Tom Folk May 17 '16 at 10:56
  • For UTF8 character set. You need to do following code with your function `CHARSET utf8` after `varchar(4000)` at both lines – Md. Khalakuzzaman Khan Oct 31 '16 at 10:52
30

Here's my implementation of a strip_tags function:

CREATE FUNCTION `strip_tags`($str text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    LOOP
        SET $start = LOCATE("<", $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE(">", $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, "");
    END LOOP;
END;

I made sure it removes mismatched opening brackets because they're dangerous, though it ignores any unpaired closing brackets because they're harmless.

mysql> select strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.');
+----------------------------------------------------------------------+
| strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.') |
+----------------------------------------------------------------------+
| hello world again.                                                   |
+----------------------------------------------------------------------+
1 row in set

Enjoy.

Boann
  • 48,794
  • 16
  • 117
  • 146
  • This is neat. I changed this a little to remove content from encoded html. Thanks @Boann. – srrvnn Jun 07 '14 at 07:42
  • 6
    Also to note, you may want to put a `SET $str = COALESCE($str, '');` just before the loop otherwise null values may cause a crash/never ending query. – Tom C Aug 17 '15 at 09:51
  • You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IF (!$start) THEN RETURN $str' at line 1 – tfont May 03 '18 at 14:47
  • ^^^ TO FIX ABOVE: Add a 'DELIMITER'. See other answers for example usage. – tfont May 03 '18 at 14:50
  • @tfont Add `DELIMITER $$` at the beginning of the function. Append `$$` to `END` so the last line will become `END$$` Finally at the end add this line `DELIMITER ;` – Saif Jul 03 '18 at 11:38
  • Working on MySql 5.5.53 – Iván Martínez Jul 11 '18 at 07:48
  • This worked perfectly (with the adjustments suggested in the comments) to calculate post word count on my Wordpress database. Thanks – Dagmar Dec 13 '22 at 09:46
14

If your content always start with tags(<body> etc.)

try this:

SELECT * from table  WHERE colmn_name REGEXP  '>[^<]*mytext';
Ryan Ou
  • 331
  • 3
  • 6
6

Regular expression matching is needed here, with negative look-ahead assertion: "mytext" not followed by a closing tag. MySQL still, as of 8.0, doesn't support look assertions, but MariaDB does. MariaDB query for the question:

SELECT * FROM table WHERE column_name REGEXP 'mytext(?![^<>]*>)';

Another solution for the question is to remove some / all tags before matching. It's inefficient, compared with REGEXP, but also works. In MySQL starting with 8.0 and MariaDB starting with 10.0.5, there is built-in REGEXP_REPLACE function. 'strip_html' is even the first example in MariaDB corresponding documentation page. MySQL / MariaDB query for such approach:

SELECT * FROM table WHERE REGEXP_REPLACE (column_name, '<.+?>', '') LIKE '%mytext%';

Apart from that, strings in the question mix data and representation. They shouldn't be regularly searched as it is a waste of system resources.

  • That seems to be only for the main HTML tags. It wouldn't work unless you specified everything single possible HTML tag and even then that won't work since new tags could be added at any time and the query would have to be updated accordingly. – The Unknown Dev Sep 28 '16 at 19:41
  • @Kimberly W No, MariaDB example works for any tag. Concrete tags there are given in the 'subject' argument (where to replace) of REGEXP_REPLACE. The 'pattern' argument (what to replace) is '<.+?>', which corresponds to any tag. –  Jun 02 '18 at 03:17
  • By far the best solution in my opinion! Worked straight away with MariaDB 10.6.9. Much easier to implement than the other recommendations. – rf1234 Aug 19 '22 at 11:30
3

Add these lines to fnStripTags function
after SET Dirty = Insert( Dirty, iStart, iLength, '');

set Dirty = Replace(Dirty,'&nbsp;',''); #No space between & and nbsp;
set Dirty = Replace(Dirty,'\r','');
set Dirty = Replace(Dirty,'\n','');
000
  • 3,976
  • 4
  • 25
  • 39
ZAky
  • 31
  • 2
2

I used strip_tags() above (Thanks Boann) with a slight mod to be rid of the html codes of accented chars etc. Like so:

...BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    SET $str = REPLACE($str, "&nbsp;", " ");
    SET $str = REPLACE($str, "&euro;", "€");
    SET $str = REPLACE($str, "&aacute;", "á");
    SET $str = REPLACE($str, "&eacute;", "é");
    SET $str = REPLACE($str, "&iacute;", "í");
    SET $str = REPLACE($str, "&oacute;", "ó");
    SET $str = REPLACE($str, "&uacute;", "ú");
LOOP...
cucu8
  • 891
  • 10
  • 12
2

MySQL strip tags implementation, allowing you to target a specific tag so that we can replace out the tags one by one with each function call. You just need pass the tag parameter, e.g. 'a' to replace out all opening/closing anchor tags.

# MySQL function to programmatically replace out specified html tags from text/html fields

# run this to drop/update the stored function
DROP FUNCTION IF EXISTS `strip_tags`;

DELIMITER |

# function to nuke all opening and closing tags of type specified in argument 2
CREATE FUNCTION `strip_tags`($str text, $tag text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    SET $str = COALESCE($str, '');
    LOOP
        SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE('>', $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, '');
        SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
    END LOOP;
END;

| DELIMITER ;

# test select to nuke all opening <a> tags
SELECT 
    STRIP_TAGS(description, 'a') AS stripped
FROM
    tmpcat;

# run update query to replace out all <a> tags
UPDATE tmpcat
SET 
    description = STRIP_TAGS(description, 'a');
ajmedway
  • 1,492
  • 14
  • 28
2

Expanded the strip_tags function as provided by Boann. It now can be used to either keep or ignore the phrase between the tags. Note the bug with empty tags in my part of the code, ie. for $keep_phrase = false.

    CREATE FUNCTION strip_tags($str text, $tag text,$keep_phrase bool) RETURNS text
    BEGIN
        DECLARE $start, $end INT DEFAULT 1;
        SET $str = COALESCE($str, '');
        LOOP
            SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
            IF (!$start) THEN RETURN $str; END IF;
            IF ($keep_phrase) THEN
                SET $end = LOCATE('>', $str, $start);
                IF (!$end) THEN SET $end = $start; END IF;
                SET $str = INSERT($str, $start, $end - $start + 1, '');
                SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
            ELSE
                SET $end = LOCATE(CONCAT('</', $tag, '>'),$str,$start);
                IF (!$end) THEN 
                    SET $end = LOCATE('/>',$str,$start); 
                    SET $str = INSERT($str, $start, $end - $start + 2, '');
                ELSE 
                    SET $str = INSERT($str, $start, $end - $start 
                       + LENGTH(CONCAT('</', $tag, '>')), '');
                END IF;
            END IF;
        END LOOP;
    END //

To prove the functionality:

    SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><p>so<span id="y"> very</span> cool</p>','span',true);
    <p>so very cool</p><p>so very cool</p>

    SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><p>so<span id="y"> very</span> cool</p>','span',false);
    <p>so cool</p><p>so cool</p>

Empty elements with $keep_phrase = false is not supported, see:

    SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><span/><p>so<span id="y"> very</span> cool</p>','span',false);
    <p>so cool</p> cool</p>
  • 1
    Above function does work well, but only after fixing the CREATE statement. MySQL does not have a BOOL data type (I usually use TINYINT instead). I copied the code between BEGIN and END and used the "Add Function" interface in PHPMyAdmin to create instead - I can confirm the function works as intended on my data. – Djimmr Mar 23 '22 at 14:30
  • This is how you can set it up in PHPMyAdmin: https://imgur.com/a/in7Fmey – Richard Peck Dec 31 '22 at 09:46
2

I think there is no need for such extraordinary logic. It can be simplified like...

-- set @StrRow = 'this is first <a href="mytext.txt">row</a> from the table';
-- set @StrRow = 'THis is the second row <img src ="mytext.jpg"> my image is there';
 set @StrRow = '<p>This is the Third row  my mytext is there  </p>';
-- set @StrRow = '<p class="te_mytext">This is the Third row  my text is there  </p>';

set @MyText = 'mytext';

select locate('<', @StrRow, locate(@MyText, @StrRow)) as '<', locate(@MyText, @StrRow) as MyText, locate('>', @StrRow, locate(@MyText, @StrRow)) as '>'
from xyz
 where
 locate('<', @StrRow, locate(@MyText, @StrRow)) > 0 and 
 locate('<', @StrRow, locate(@MyText, @StrRow)) < locate('>', @StrRow, locate(@MyText, @StrRow))
TarakPrajapati
  • 479
  • 3
  • 12
-3

You cannot parse HTML inside of SQL queries, that doesn't make any sense. Perhaps you could just maintain a special search version of the table with all the HTML stripped, but you would have to use some external processing to do that.

Egor Pavlikhin
  • 17,503
  • 16
  • 61
  • 99
  • ok in php i get this 4 records but i need to show case only that correct record (row3) how can i do but the result come 4 records but i need to display the particular 3d result only how can i parse form PHP – Subha Apr 13 '10 at 08:22