extract some html with preg_match

Question

i use preg_mach for extract some html ( i try to use DOMDocument but i had some problem with new line ) any way ... that's my code ..

1.html

<body>


            <!-- icon and title -->
            <div class="smallfont">
                <img class="inlineimg" src="images/icons/icon1.gif" alt="" border="0" />
                <strong>qrtoobah 3nwan</strong>
            </div>
            <hr size="1" style="color:#CCCCCC; background-color:#CCCCCC" />
            <!-- / icon and title -->


        <div id="post_message_14142536">

            <font size="7"><font color="red">msaha 700</font></font><br />
<font size="7"><font color="red">shamali 20</font></font><br />
<font size="7"><font color="red"> 1700 almetr</font></font><br />
<font size="7"><font color="#ff0000">sooom bs</font></font><br />
<font size="7"><font color="#ff0000">albee3 qreeb</font></font>
        </div>
        <!-- message -->


</body>

extract.php

<?php 
$html = file_get_contents("1.html");
$pattern = '/<([!]+)([^]+).*>([^]+)(message\ \-\-\>)/';
   preg_match($pattern, $html, $matches);
 print_r($matches);


?>

i want to get any thing between )blablabla(... but i get that array :

Array ( [0] => [1] => ! [2] => -- [3] => message --> )

i think the problem is much more trivial. Right click -> View Source. in `Array( [0]` is nothing because it is a html comment and therefore not displayed. — MarcDefiant, Sep 07 '12 at 14:09
you also need to pass the "s" or the "m" (not sure which) modifier to make `.` match newlines — MarcDefiant, Sep 07 '12 at 14:11
is there any way to extract it .. or extract the two div above — aboji, Sep 07 '12 at 14:20
I just can't help myself: "The cannot hold it is too late." http://stackoverflow.com/a/1732454/1174378 — Mihai Todor, Sep 07 '12 at 14:35
[The pony he comes...](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — , Sep 07 '12 at 22:53

StasGrin · Accepted Answer · 2012-09-07T22:58:47.243

Use strpos to find first tag position. Then find ending tag also with strpos. I mean - if u know from where to what you are looking for and they are unique.. so what matter in preg_* functions?

So i guess something like this will work fine (I make code clear as possible for understanding my idea in step-by-step actions):

$tag_begin = "<!-- icon and title -->";
$tag_end   = "<!-- message -->";
$begin     = strpos($tag_begin,$text)+strlen($tag_begin);
$end       = strpos($tag_end,$text);
$result    = substr($begin,$end, $text);

Also u can do exactly the same if u want find and store all structures between opening  and closing .
Only change u must do - first find with preg_match all opening structures names. For example:

$result_cnt = preg_match_all('#<!-- [^/].*-->#', $text , $openings);

// Output for your example HTML is:
$openings = 
array (
  0 => 
  array (
    0 => '<!-- icon and title -->',
    1 => '<!-- message -->',
  ),
)

After that one-loop for $openings and find with code above all needed. just adding to openings closing "/" chacter in right place.

extract some html with preg_match

1 Answers1