2

I try to extract all values from a text, between two values, ex: <p> / <\/p> Right now I can extract only the first one.

public function get_string_between($string, $start, $end)
{
    $string = ' ' . $string;
    $ini = strpos($string, $start);
    if ($ini == 0) return '';
    $ini += strlen($start);
    $len = strpos($string, $end, $ini) - $ini;
    return substr($string, $ini, $len);
}

$fullstring = '[{"content":{"content":"<h1>Acceptances<\/h1>","numbering":""},"children":[{"content":{"content":"<p><span>Ownership of the Products remains with the [X] and will not pass to the [Y] until one of the following events occurs:<\/span><\/p>","numbering":""},"children":[{"content":{"content":"<p><span>The [X] is paid for all of the Products and no other amounts are owed by the [Y] to the [X] in respect of other Products supplied by the [X].<\/span><\/p>","numbering":""},"children":[]},{"content":{"content":"<p><span>The [Y] sells the Products in accordance with this agreement in which case ownership of the Products will pass to the [Y] immediately before the Products are delivered to the [Y]&#039;s customer.<\/span><\/p>","numbering":""},"children":[]}]},{"content":{"content":"<p><span>Where the Products are attached to or incorporated in other Products or are altered by the [Y], ownership of the Products shall not pass to the [Y] by virtue of the attachment, incorporation or alteration if the Products remain identifiable and, where attached to or incorporated in other Products, can be detached or removed from them.<\/span><\/p>","numbering":""}';

$paragraph_start_1 = '<p>';
$paragraph_end_2 = '<\/p>';
$paragraph = $this->get_string_between($fullstring, $paragraph_start_1, $paragraph_end_2);
//The output is just the first one and I need all.
Douwe de Haan
  • 6,247
  • 1
  • 30
  • 45
Beusebiu
  • 1,433
  • 4
  • 23
  • 68

2 Answers2

1

Use regex instead:

public function get_string_between($string, $start, $end)
{
    $re = $start.'(.*?)'.$end.'/m';

    preg_match_all($re, $string, $matches, PREG_SET_ORDER, 0);

    return($matches);
}

If you want to test the regex:

$re = '/<p>(.*?)<\\\\\/p>/m';
$str = '<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam pulvinar sollicitudin risus, et aliquam ante efficitur non. Pellentesque vel lorem euismod, efficitur turpis eu, vehicula tellus. Aliquam pretium nulla a ex sollicitudin fringilla. Praesent lacus nibh, consequat nec imperdiet nec, volutpat id lacus. Suspendisse tristique nisl sapien, imperdiet lobortis lectus vulputate dapibus. Curabitur vulputate enim felis. Curabitur vehicula risus et nisi vehicula luctus. Quisque id urna ut sem volutpat accumsan. Curabitur ut odio faucibus massa ultricies auctor. Curabitur id vulputate mi, dignissim varius turpis. In hac habitasse platea dictumst. Proin suscipit ex ut neque facilisis pellentesque. Ut et efficitur sapien.</p>
<p>Nulla facilisi. Phasellus maximus dui sed maximus sodales. Aliquam imperdiet est a elit sollicitudin, id lobortis lectus vehicula. Sed ut accumsan ligula. Maecenas id scelerisque risus, non pharetra nisi. Praesent rhoncus sem turpis, sed fermentum orci aliquet et. Sed vitae turpis id eros commodo maximus. Praesent fringilla eros nisl, ac cursus mauris iaculis vel. Donec vulputate ornare augue eget pulvinar.</p>';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);
Douwe de Haan
  • 6,247
  • 1
  • 30
  • 45
  • I tried this, but in result I got only for first one correct ,concatenated with the rest of the file. I edit the post for $fullstring, to see an example, thanks! – Beusebiu Mar 24 '20 at 09:33
  • @Beusebiu the regex works like a charm for me [see it in action here](https://regex101.com/r/iNpukT/1). Maybe you forgot the multiple and global flag? – Douwe de Haan Mar 24 '20 at 09:37
  • I hope that I didn't do something wrong, but I test it in regexr.com/510ej , the the result is like here, everything after

    .

    – Beusebiu Mar 24 '20 at 09:50
  • @Beusebiu You're right, forgot one character: [new regex here](https://regex101.com/r/iNpukT/3). This one is working as you expected. – Douwe de Haan Mar 24 '20 at 10:13
  • 1
    $re = '/

    (.*?)<\\\\\/p>/m'; to replace in your answer, and thank you, now it works!

    – Beusebiu Mar 24 '20 at 10:32
1

Only use regex as solution for this kind of problems if you're absolutely sure the input string always follows the same kind of format. For example: Always one <p> but position is unknown.

Else, please extract the text using native DOM or XML parsers. See this extensive answer: How do you parse and process HTML/XML in PHP?

Piemol
  • 857
  • 8
  • 17
  • I edited the post with a real string. This is the type of string that I must work with, and to extract all paragraphs from it. – Beusebiu Mar 24 '20 at 09:36