0

I got a html-string in a variable, which looks something like this:

<h1>Title 1</h1>
 Introduction
 <h2>Chapter 1</h2>
  <p>Always just one line</p>
  <p class="description">Some more text.</p>
  <p class="description">Maybe with multiple lines.</p>
 <h2>Chapter 2</h2>
  <p>Always just one line</p>
  <p class="description">Some more text.</p>
  <p class="description">Maybe with multiple lines.</p>

<h1>Title 2</h1>
Introduction
 <h2>Chapter 1</h2>
  <p>Always just one line</p>
  <p class="description">Some more text.</p>
  <p class="description">Maybe with multiple lines.</p>
 <h2>Chapter 2</h2>
  <p>Always just one line</p>
  <p class="description">Some more text.</p>
  <p class="description">Maybe with multiple lines.</p> 

For further processing I need these "blocks" in a variable (array). First of all the main-chapter which starts with a <h1> and goes to the next <h1> should be seperated.

I tried to use explode() with a delimiter <h1 But that removes part of the tag itself.

And as a second step I also need to separate the chapter of each "block". At a last step I need to get the description of a chapter content.

I think the key is the first step: Split the complete thing into main-chapter into an array. After that I can process the "subblocks" in a foreach loop or something else with the same technique (I guess).

Sky
  • 4,244
  • 7
  • 54
  • 83
user3142695
  • 15,844
  • 47
  • 176
  • 332
  • If you html-string has newline chars '\n' you can use explode with delimiter '\n' – Ollie Strevel Sep 19 '14 at 19:26
  • So you used explode, but it removes the delimiter. Maybe the question "[Is there way to keep delimiter while using php explode or other similar functions?](http://stackoverflow.com/questions/2938137/is-there-way-to-keep-delimiter-while-using-php-explode-or-other-similar-function)" is of interest to you. – GolezTrol Sep 19 '14 at 19:29

2 Answers2

1

Okay. No problem. Use explode() function. It removes <h1 you can easily add <h1 yourself like this:

<?php
$html = '<h1>Title 1</h1>
     Introduction
     <h2>Chapter 1</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
     <h2>Chapter 2</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>

    <h1>Title 2</h1>
    Introduction
     <h2>Chapter 1</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
     <h2>Chapter 2</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
    ';

$html = explode('<h1', $html);
for ($i = 0 ; $i < count($html) ; $i++) $html[$i] = '<h1' . $html[$i];
unset($html[0]); //if <h1 is on the beginning of string
print_r(var_dump($html));

BTW you can remove the first index if it's empty. (Because <h1 is in the beginnig of your string) you can even add this in your for:

if ($html[$i] == '') unset($html[$i]);

Answer to your comment:

If you want to split <h2 too, you can do the same thing again but with h2 delimiter:

<?php
$html = '<h1>Title 1</h1>
     Introduction
     <h2>Chapter 1</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
     <h2>Chapter 2</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>

    <h1>Title 2</h1>
    Introduction
     <h2>Chapter 1</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
     <h2>Chapter 2</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
    ';

$html = explode('<h1', $html);
for ($i = 0 ; $i < count($html) ; $i++) $html[$i] = '<h1' . $html[$i];

// h2:

for ($i = 0 ; $i < count($html) ; $i++){
    $html[$i] = explode('<h2', $html[$i]);
    for ($j = 0 ; $j < count($html[$i]) ; $j++) if(strpos($html[$i][$j],'>') == 0) $html[$i][$j] = '<h2' . $html[$i][$j];
}
unset($html[0]);
print_r(var_dump($html));
Sky
  • 4,244
  • 7
  • 54
  • 83
  • Thanks. That really helps. How would you do the second split for <h2 to get a multidimensional array? I mean the var html now has the main chapter - which works great. But I want to split the main chapter into the h2-chapter and put everything into one variable. – user3142695 Sep 19 '14 at 19:51
  • Really last question: How can I get the content/text of h1- or h2-tag? Just i.e. 'Title 2' or 'Chapter 1'. – user3142695 Sep 19 '14 at 20:36
  • @user3142695 You can iterate over all lines in array (or just in the `for`) and replace all `

    – Sky Sep 19 '14 at 20:46
0

As mentioned in the comment, you could explode('\n', $string) and then iterate over all lines, switching to the next chapter, if strpos($line, '<h1>') !== false.

However, you cannot generally extract html elements from a string using simple string tools. Try using DOMDocument::loadHTML() instead.

andy
  • 2,002
  • 1
  • 12
  • 21