0

i use preg_mach for extract some html ( i try to use DOMDocument but i had some problem with new line ) any way ... that's my code ..

1.html

<body>


            <!-- icon and title -->
            <div class="smallfont">
                <img class="inlineimg" src="images/icons/icon1.gif" alt="" border="0" />
                <strong>qrtoobah 3nwan</strong>
            </div>
            <hr size="1" style="color:#CCCCCC; background-color:#CCCCCC" />
            <!-- / icon and title -->


        <div id="post_message_14142536">

            <font size="7"><font color="red">msaha 700</font></font><br />
<font size="7"><font color="red">shamali 20</font></font><br />
<font size="7"><font color="red"> 1700 almetr</font></font><br />
<font size="7"><font color="#ff0000">sooom bs</font></font><br />
<font size="7"><font color="#ff0000">albee3 qreeb</font></font>
        </div>
        <!-- message -->


</body>

extract.php

<?php 
$html = file_get_contents("1.html");
$pattern = '/<([!]+)([^]+).*>([^]+)(message\ \-\-\>)/';
   preg_match($pattern, $html, $matches);
 print_r($matches);


?>

i want to get any thing between <!-- icon and title -->)blablabla(<!-- / message -->... but i get that array :

Array ( [0] => [1] => ! [2] => -- [3] => message --> ) 
Mihai Iorga
  • 39,330
  • 16
  • 106
  • 107
aboji
  • 37
  • 5
  • i think the problem is much more trivial. Right click -> View Source. in `Array( [0]` is nothing because it is a html comment and therefore not displayed. – MarcDefiant Sep 07 '12 at 14:09
  • you also need to pass the "s" or the "m" (not sure which) modifier to make `.` match newlines – MarcDefiant Sep 07 '12 at 14:11
  • is there any way to extract it .. or extract the two div above – aboji Sep 07 '12 at 14:20
  • 2
    I just can't help myself: "The
    cannot hold it is too late." http://stackoverflow.com/a/1732454/1174378
    – Mihai Todor Sep 07 '12 at 14:35
  • [The pony he comes...](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) –  Sep 07 '12 at 22:53

1 Answers1

0

Use strpos to find first tag position. Then find ending tag also with strpos. I mean - if u know from where to what you are looking for and they are unique.. so what matter in preg_* functions?

So i guess something like this will work fine (I make code clear as possible for understanding my idea in step-by-step actions):

$tag_begin = "<!-- icon and title -->";
$tag_end   = "<!-- message -->";
$begin     = strpos($tag_begin,$text)+strlen($tag_begin);
$end       = strpos($tag_end,$text);
$result    = substr($begin,$end, $text);


Also u can do exactly the same if u want find and store all structures between opening <!-- (.*) --> and closing <!-- / (.*) -->.
Only change u must do - first find with preg_match all opening structures names. For example:

$result_cnt = preg_match_all('#<!-- [^/].*-->#', $text , $openings);

// Output for your example HTML is:
$openings = 
array (
  0 => 
  array (
    0 => '<!-- icon and title -->',
    1 => '<!-- message -->',
  ),
)

After that one-loop for $openings and find with code above all needed. just adding to openings closing "/" chacter in right place.

StasGrin
  • 1,800
  • 2
  • 14
  • 30