There was some great sollutions here, however not perfekt for extracting parts of code from say HTML which was my problem right now, as I need to get script blocks out of the HTML before compressing the HTML. So building on @raina77ow original sollution, expanded by @Cas Tuyn I get this one:
$test_strings = [
'0<p>a</p>1<p>b</p>2<p>c</p>3',
'0<p>a</p>1<p>b</p>2<p>c</p>',
'<p>a</p>1<p>b</p>2<p>c</p>3',
'<p>a</p>1<p>b</p>2<p>c</p>',
'<p></p>1<p>b'
];
/**
* Seperate a block of code by sub blocks. Example, removing all <script>...<script> tags from HTML kode
*
* @param string $str, text block
* @param string $startDelimiter, string to match for start of block to be extracted
* @param string $endDelimiter, string to match for ending the block to be extracted
* @return array [all full blocks, whats left of string]
*/
function getDelimitedStrings($str, $startDelimiter, $endDelimiter) {
$contents = array();
$startDelimiterLength = strlen($startDelimiter);
$endDelimiterLength = strlen($endDelimiter);
$startFrom = $contentStart = $contentEnd = $outStart = $outEnd = 0;
while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) {
$contentStart += $startDelimiterLength;
$contentEnd = strpos($str, $endDelimiter, $contentStart);
$outEnd = $contentStart - 1;
if (false === $contentEnd) {
break;
}
$contents['in'][] = substr($str, ($contentStart-$startDelimiterLength), ($contentEnd + ($startDelimiterLength*2) +1) - $contentStart);
if( $outStart ){
$contents['out'][] = substr($str, ($outStart+$startDelimiterLength+1), $outEnd - $outStart - ($startDelimiterLength*2));
} else if( ($outEnd - $outStart - ($startDelimiterLength-1)) > 0 ){
$contents['out'][] = substr($str, $outStart, $outEnd - $outStart - ($startDelimiterLength-1));
}
$startFrom = $contentEnd + $endDelimiterLength;
$startFrom = $contentEnd;
$outStart = $startFrom;
}
$total_length = strlen($str);
$current_position = $outStart + $startDelimiterLength + 1;
if( $current_position < $total_length )
$contents['out'][] = substr($str, $current_position);
return $contents;
}
foreach($test_strings AS $string){
var_dump( getDelimitedStrings($string, '<p>', '</p>') );
}
This will extract all
wlements with the possible innerHTML aswell, giving this result:
array (size=2)
'in' => array (size=3)
0 => string '<p>a</p>' (length=8)
1 => string '<p>b</p>' (length=8)
2 => string '<p>c</p>' (length=8)
'out' => array (size=4)
0 => string '0' (length=1)
1 => string '1' (length=1)
2 => string '2' (length=1)
3 => string '3' (length=1)
array (size=2)
'in' => array (size=3)
0 => string '<p>a</p>' (length=8)
1 => string '<p>b</p>' (length=8)
2 => string '<p>c</p>' (length=8)
'out' => array (size=3)
0 => string '0' (length=1)
1 => string '1' (length=1)
2 => string '2' (length=1)
array (size=2)
'in' => array (size=3)
0 => string '<p>a</p>' (length=8)
1 => string '<p>b</p>' (length=8)
2 => string '<p>c</p>' (length=8)
'out' => array (size=3)
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '3' (length=1)
array (size=2)
'in' => array (size=3)
0 => string '<p>a</p>' (length=8)
1 => string '<p>b</p>' (length=8)
2 => string '<p>c</p>' (length=8)
'out' => array (size=2)
0 => string '1' (length=1)
1 => string '2' (length=1)
array (size=2)
'in' => array (size=1)
0 => string '<p></p>' (length=7)
'out' => array (size=1)
0 => string '1<p>b' (length=5)
You can see a demo here: 3v4l.org/TQLmn