MySQL get string(s) between two # / multiple pairs of #

Question

How can I find string between two # or multiple pairs of #.

An example text to search: This is #important# and needs to elaborated further. Remember to buy #milk before coming home#.

I want results to be:

important

milk before coming home

score 2 · Answer 1 · answered Sep 18 '16 at 15:59

2

You can try this code

<?php
    $s =  'This is #important# and needs to elaborated further. Remember to buy #milk before coming home#';
    for($i=0;$i<strlen($s);$i++)
        if($s[$i] == "#") {
        $i++;
        $str="";
        while($s[$i]!='#') {
            echo $s[$i];
            $i++;
        }
        echo '<br>';
    };
    ?>

then you will get your result

important

milk before coming home

answered Sep 18 '16 at 15:59

Allloush

1,140
8
27

But how can I do it directly with MySQL query? – Jouni Lahtinen Sep 18 '16 at 16:11
1

After you get your sql statement executed, loop over the result and do the same previous code – Allloush Sep 18 '16 at 18:23
I'll probably just get the whole stuff from database and use php to parse it and get what I want, thanks – Jouni Lahtinen Sep 18 '16 at 18:38

score 2 · Answer 2 · edited May 23 '17 at 10:27

Edit 1

create table t91
(   id int auto_increment primary key,
    thing varchar(1000) not null
);
insert t91(thing) values
('This is #important# and needs to elaborated further. Remember to buy #milk before coming home#'),
('This is #important# and needs to elaborated further. Remember to buy #milk home#'),
('This is #');


select id,thing,theCount
from
(   SELECT id,thing,
        ROUND (   
            (
                LENGTH(thing)
                - LENGTH( REPLACE ( thing, "#", "") ) 
            ) / LENGTH("#")        
        ) AS theCount    
    FROM t91
) d
where d.theCount>1
-- 2 rows returned (id 1 and 2)

The above Edit used a strategy poached from This Answer

Edit2

-- truncate table t91;
create table t91
(   id int auto_increment primary key,
    thing varchar(1000) not null
);
insert t91(thing) values
('This is #important# and needs to elaborated further. Remember to buy #milk before coming home#'),
('This is #important# and needs to elaborated further. Remember to buy #milk home#'),
('This is #'),
('This is #1# ... #2# ... #');

Function:

DROP FUNCTION IF EXISTS findInsideHashMarks;
DELIMITER $$
CREATE FUNCTION findInsideHashMarks(s VARCHAR(200),segments INT)
RETURNS VARCHAR(200)
BEGIN
    DECLARE i,nextPos,i1,i2 INT;
    DECLARE sOut VARCHAR(200);
    SET i=0;
    SET nextPos=1;
    SET sOut='';
    WHILE i<segments DO
        -- LOCATE(substr,str,pos)
        SET i1=LOCATE('#',s,nextPos);
        IF i1>0 THEN
            SET i1=i1+1;
            SET nextPos=i1+1;
            SET i2=LOCATE('#',s,nextPos);
            IF i2>0 THEN
                SET nextPos=i2+1;
                SET i2=i2-1;
                SET sOut=CONCAT(sOut,SUBSTRING(s,i1,i2-i1+1));
            END IF;
        END IF;
        SET i=i+1;
        IF i<segments THEN
            SET sOut=CONCAT(sOut,',');
        END IF;
    END WHILE;
    RETURN sOut;
END$$
DELIMITER ;

Query:

SELECT id,
SUBSTRING_INDEX(SUBSTRING_INDEX(segString, ',', n.digit+1), ',', -1) dummy
FROM
(   select id,thing,theCount,cast(floor(theCount / 2) as unsigned) as segments,
    findInsideHashMarks(thing,cast(floor(theCount / 2) as unsigned)) as segString
    FROM
    (   SELECT id,thing,
            ROUND (   
                (   LENGTH(thing)
                    - LENGTH( REPLACE ( thing, "#", "") ) 
                ) / LENGTH("#")        
            ) AS theCount    
        FROM t91
    ) d1
    where d1.theCount>1
) d2
INNER JOIN
(SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) n
ON LENGTH(REPLACE(d2.segString, ',' , '')) <= LENGTH(d2.segString)-n.digit
ORDER BY d2.id,n.digit;

Output:

+----+-------------------------+
| id | dummy                   |
+----+-------------------------+
|  1 | important               |
|  1 | milk before coming home |
|  2 | important               |
|  2 | milk home               |
|  4 | 1                       |
|  4 | 2                       |
+----+-------------------------+

The digits breakout of a set (a,b,c) to rows was inspired (understatement) by this Answer from User fthiella. As noted in that Answer from fthiella (or not expressly so), the UNION strategy could be expanded to pick up, say, 10 or more chunks of data between # markers.

Also note the prior attribution at the bottom of Edit1.

Edit3

An offshoot off of Edit2's function and query, but using a permanent Helper Table to avoid the UNION in the query for derived table n. It supports up to 100 # segments. Uses the prior (Edit2) t91 table and the Function.

Schema:

CREATE TABLE 4kTable
(   -- a helper table of about 4k consecutive ints
    id int auto_increment primary key,
    thing int null
)engine=MyISAM;

insert 4kTable (thing) values (null),(null),(null),(null),(null),(null),(null),(null),(null);
insert 4kTable (thing) select thing from 4kTable;
insert 4kTable (thing) select thing from 4kTable;
insert 4kTable (thing) select thing from 4kTable;
insert 4kTable (thing) select thing from 4kTable;
insert 4kTable (thing) select thing from 4kTable;
insert 4kTable (thing) select thing from 4kTable;
insert 4kTable (thing) select thing from 4kTable;
insert 4kTable (thing) select thing from 4kTable;
insert 4kTable (thing) select thing from 4kTable;
-- verify:
-- select min(id),max(id),count(*) from 4kTable;
-- 1 4608 4608

ALTER TABLE 4kTable ENGINE = InnoDB; -- *********** it is now InnoDB

-- verify:
-- select min(id),max(id),count(*) from 4kTable;
-- 1 4608 4608
-- no innodb auto_increment gaps (consecutive block)

Query:

SELECT id, 
SUBSTRING_INDEX(SUBSTRING_INDEX(segString, ',', n.digit+1), ',', -1) dummy 
FROM 
(   select d1.id,thing,theCount,cast(floor(theCount / 2) as unsigned) as segments, 
    findInsideHashMarks(thing,cast(floor(theCount / 2) as unsigned)) as segString 
    FROM 
    (   SELECT id,thing, 
            ROUND (   
                (   LENGTH(thing) 
                    - LENGTH( REPLACE ( thing, "#", "") )  
                ) / LENGTH("#")         
            ) AS theCount     
        FROM t91 
    ) d1 
    where d1.theCount>1 
) d2 
INNER JOIN
(select id-1 as digit from 4kTable where id<=100) n
ON LENGTH(REPLACE(d2.segString, ',' , '')) <= LENGTH(d2.segString)-n.digit
ORDER BY d2.id,n.digit;

Same output as above Edit2 Output section. I usually have a few permanent Helper Tables in a system, especially for left joins with dates.

But what if you just know you need find something between #...# and you do not know what? — Jouni Lahtinen, Sep 18 '16 at 16:49
@Drew why don't you use [`CHAR_LENGTH(str)`](http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_char-length)? — Paul Spiegel, Sep 18 '16 at 17:08
@PaulSpiegel cuz I am in the middle of a lot of stuff and sought a solution :p — Drew, Sep 18 '16 at 17:25
Sorry guys but don't get it. I have table called Event and column called description in there. I need to find all occurrences (strings) which are delimited by #....# from the description. # in the beginning and # at the end of the string. Example text is just one row (description) in the table. So I want 2 rows as a result "important" and "milk before coming home". Sorry for being so vague in the first place... — Jouni Lahtinen, Sep 18 '16 at 18:13
well load up my t91 and run the query. Two out of the 3 rows qualify with a `# ...#` thing. Determine if you can accept a stored proc solution. It would take the (count mod 2) and know the segment count. And loop. Also, how would call the proc with parameters. Devil's in the details. — Drew, Sep 18 '16 at 18:20
I tried it, thanks. It is not exactly what I'm looking for though... I guess I need some more trial and error + reading some documentation. — Jouni Lahtinen, Sep 18 '16 at 18:26
Here's a working [sqlfiddle](http://sqlfiddle.com/#!9/7415f/5) with CHAR_LENGTH(). BTW: Nice trick using MyISAM for the sequence table. — Paul Spiegel, Sep 19 '16 at 08:20

MySQL get string(s) between two # / multiple pairs of #

2 Answers2

Edit 1

Edit2

Edit3

Linked