17

I found this function which finds data between two strings of text, html or whatever.

How can it be changed so it will find all occurrences? Every data between every occurrence of $start [some-random-data] $end. I want all the [some-random-data] of the document (It will always be different data).

function getStringBetween($string, $start, $end) {
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}
Ivar
  • 6,138
  • 12
  • 49
  • 61
user3778578
  • 215
  • 1
  • 3
  • 6

6 Answers6

51

One possible approach:

function getContents($str, $startDelimiter, $endDelimiter) {
  $contents = array();
  $startDelimiterLength = strlen($startDelimiter);
  $endDelimiterLength = strlen($endDelimiter);
  $startFrom = $contentStart = $contentEnd = 0;
  while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) {
    $contentStart += $startDelimiterLength;
    $contentEnd = strpos($str, $endDelimiter, $contentStart);
    if (false === $contentEnd) {
      break;
    }
    $contents[] = substr($str, $contentStart, $contentEnd - $contentStart);
    $startFrom = $contentEnd + $endDelimiterLength;
  }

  return $contents;
}

Usage:

$sample = '<start>One<end>aaa<start>TwoTwo<end>Three<start>Four<end><start>Five<end>';
print_r( getContents($sample, '<start>', '<end>') );
/*
Array
(
    [0] => One
    [1] => TwoTwo
    [2] => Four
    [3] => Five
)
*/ 

Demo.

raina77ow
  • 103,633
  • 15
  • 192
  • 229
  • Can someone explain to me what searching algorithm was used to implement this? And can someone explain to me in words what this function precisely does? – user3326078 Jan 08 '19 at 11:06
  • This works flawlessly just the way you wrote it. Thank you so much! It finds every occurrence. For example, if you are looking for the text between two html tags, get the html page as a string. Then set $startDelimiter to the first tag and $endDelimiter to the second tag. The function will return every instance of what is between the two tags in an array. – Greg G Apr 15 '20 at 02:02
  • @Erdss4 At each iteration a loop first collects a new position of $startDelimiter, then a position of $endDelimiter. The lookups always start from the position left by previous loop. That's the general description, for details ask a specific more question. ) – raina77ow May 07 '20 at 14:37
  • This answer is missing its educational explanation. – mickmackusa Jul 29 '21 at 21:04
  • beautiful. so sad that since SO became popular, one just doesn't see good answers like this – csaw Jul 24 '23 at 07:50
10

You can do this using regex:

function getStringsBetween($string, $start, $end)
{
    $pattern = sprintf(
        '/%s(.*?)%s/',
        preg_quote($start),
        preg_quote($end)
    );
    preg_match_all($pattern, $string, $matches);

    return $matches[1];
}
ifm
  • 1,196
  • 9
  • 15
  • 1
    this is good for standard strings but bad for things like using dollar signs and carrots as delimiters because it messes up regex – Robert Pounder Nov 19 '15 at 09:20
  • for some reason, `preg_match_all` and `preg_match_all` don't work on large strings, I tried to get all from a table. – bareMetal Mar 08 '21 at 06:04
3

I love to use explode to get string between two string. this function also works for multiple occurrences.

function GetIn($str,$start,$end){
    $p1 = explode($start,$str);
    for($i=1;$i<count($p1);$i++){
        $p2 = explode($end,$p1[$i]);
        $p[] = $p2[0];
    }
    return $p;
}
Shamim
  • 41
  • 3
2

I needed to find all these occurences between specific first and last tag and change them somehow and get back changed string.

So I added this small code to raina77ow approach after the function.

        $sample = '<start>One<end> aaa <start>TwoTwo<end> Three <start>Four<end> aaaaa <start>Five<end>';
        $sample_temp = getContents($sample, '<start>', '<end>');
        $i = 1;
        foreach($sample_temp as $value) {
            $value2 = $value.'-'.$i; //there you can change the variable
            $sample=str_replace('<start>'.$value.'<end>',$value2,$sample);
            $i = ++$i;
        }
        echo $sample;

Now output sample has deleted tags and all strings between them has added number like this:

One-1 aaa TwoTwo-2 Three Four-3 aaaaa Five-4

But you can do whatever else with them. Maybe could be helpful for someone.

Grows
  • 53
  • 6
1

There was some great sollutions here, however not perfekt for extracting parts of code from say HTML which was my problem right now, as I need to get script blocks out of the HTML before compressing the HTML. So building on @raina77ow original sollution, expanded by @Cas Tuyn I get this one:

$test_strings = [
    '0<p>a</p>1<p>b</p>2<p>c</p>3',
    '0<p>a</p>1<p>b</p>2<p>c</p>',
    '<p>a</p>1<p>b</p>2<p>c</p>3',
    '<p>a</p>1<p>b</p>2<p>c</p>',
    '<p></p>1<p>b'
];

/**
* Seperate a block of code by sub blocks. Example, removing all <script>...<script> tags from HTML kode
* 
* @param string $str, text block
* @param string $startDelimiter, string to match for start of block to be extracted
* @param string $endDelimiter, string to match for ending the block to be extracted
* @return array [all full blocks, whats left of string]
*/
function getDelimitedStrings($str, $startDelimiter, $endDelimiter) {
    $contents = array();
    $startDelimiterLength = strlen($startDelimiter);
    $endDelimiterLength = strlen($endDelimiter);
    $startFrom = $contentStart = $contentEnd = $outStart = $outEnd = 0;
    while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) {
        $contentStart += $startDelimiterLength;
        $contentEnd = strpos($str, $endDelimiter, $contentStart);
        $outEnd = $contentStart - 1;
        if (false === $contentEnd) {
            break;
        }
        $contents['in'][] = substr($str, ($contentStart-$startDelimiterLength), ($contentEnd + ($startDelimiterLength*2) +1) - $contentStart);
        if( $outStart ){
            $contents['out'][] = substr($str, ($outStart+$startDelimiterLength+1), $outEnd - $outStart - ($startDelimiterLength*2));
        } else if( ($outEnd - $outStart - ($startDelimiterLength-1)) > 0 ){
            $contents['out'][] = substr($str, $outStart, $outEnd - $outStart - ($startDelimiterLength-1));
        }
        $startFrom = $contentEnd + $endDelimiterLength;
        $startFrom = $contentEnd;
        $outStart = $startFrom;
    }
    $total_length = strlen($str);
    $current_position = $outStart + $startDelimiterLength + 1;
    if( $current_position < $total_length )
        $contents['out'][] = substr($str, $current_position);

    return $contents;
}

foreach($test_strings AS $string){
    var_dump( getDelimitedStrings($string, '<p>', '</p>') );
}

This will extract all

wlements with the possible innerHTML aswell, giving this result:

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=4)
    0 => string '0' (length=1)
    1 => string '1' (length=1)
    2 => string '2' (length=1)
    3 => string '3' (length=1)

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=3)
    0 => string '0' (length=1)
    1 => string '1' (length=1)
    2 => string '2' (length=1)

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=3)
    0 => string '1' (length=1)
    1 => string '2' (length=1)
    2 => string '3' (length=1)

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=2)
    0 => string '1' (length=1)
    1 => string '2' (length=1)

array (size=2)
'in' => array (size=1)
    0 => string '<p></p>' (length=7)
'out' => array (size=1)
    0 => string '1<p>b' (length=5)

You can see a demo here: 3v4l.org/TQLmn

Kim Steinhaug
  • 478
  • 3
  • 13
0

I also needed the text outside the pattern. So I changed the answer from raina77ow above a little:

function get_delimited_strings($str, $startDelimiter, $endDelimiter) {
    $contents = array();
    $startDelimiterLength = strlen($startDelimiter);
    $endDelimiterLength = strlen($endDelimiter);
    $startFrom = $contentStart = $contentEnd = $outStart = $outEnd = 0;
    while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) {
        $contentStart += $startDelimiterLength;
        $contentEnd = strpos($str, $endDelimiter, $contentStart);
        $outEnd = $contentStart - 1;
        if (false === $contentEnd) {
            break;
        }
        $contents['in'][] = substr($str, $contentStart, $contentEnd - $contentStart);
        $contents['out'][] = substr($str, $outStart, $outEnd - $outStart);
        $startFrom = $contentEnd + $endDelimiterLength;
        $outStart = $startFrom;
    }
    $contents['out'][] = substr($str, $outStart, $contentEnd - $outStart);
    return $contents;
}

Usage:

    $str = "Bore layer thickness [2 mm] instead of [1,25 mm] with [0,1 mm] deviation.";
    $cas = get_delimited_strings($str, "[", "]");

gives:

array(2) { 
    ["in"]=> array(3) { 
        [0]=> string(4) "2 mm" 
        [1]=> string(7) "1,25 mm" 
        [2]=> string(6) "0,1 mm" 
    } 
    ["out"]=> array(4) { 
        [0]=> string(21) "Bore layer thickness " 
        [1]=> string(12) " instead of " 
        [2]=> string(6) " with " 
        [3]=> string(10) " deviation" 
    } 
}
Cas Tuyn
  • 11
  • 2