3

First, sorry for my English, it not good.

I have Table as below.

 <table>
  <tr class="_in" id="1">
    <td>content</td>
    <td>content
         <h1>content h1</h1>
    </td>
  </tr>
  <tr class="_in" id="2">
    <td>content</td>
    <td>content
        <table>
            <tr>
                <td>content</td>
            </tr>
        </table>
    <h2>content h2</h2>
    </td>
  </tr>
  <tr class="_in" id="3">
    <td>content</td>
    <td>
            <table>
              <tr>
                <td>content</td>
              </tr>
            </table>
            <h3>content h3</h3>
    </td>   
  </tr>
  <tr class="_in" id="4">
    <td>content</td>
    <td>content
        <h1>content h3</h1>
    </td>
  </tr>
  <tr class="_in" id="5">
    <td>content</td>
    <td>content
        <h1>content h1</h1>
    </td>
  </tr>
</table>

As you see, i want use regular expression to get tr has class="_in", but in tr have another table and in that table have another tr tag. beside that, tr has class="_in" end with many way. as you can see it can end with </h1></td></tr> or </h2></td></tr> or </h3></td></tr>

My solution is use or operator but don't have result, below are my code

$html=file_get_contents("vnair3.txt");
$parten='/<tr\sclass=\"_in\"[^>]*>.*(?:<\/h1>|<\/h2>|<\/h3>)\s+<\/td>\s+<\/tr>/isU';
preg_match_all($parten,$html,$output);
print_r($output);

Please help me get each tr tag has class="_in" to each element in ouput array. i use php. Thanks all

Mr.Lak
  • 68
  • 5
  • Not sure if i understood you but... can you check [this](http://simplehtmldom.sourceforge.net/)? – Jose Adrian Sep 19 '12 at 02:37
  • 3
    No. Stop. Don't use a Regular Expression. For the love of God, please resist the urge. I know that it seems like a good idea, but it isn't. Just listen to @JoseAdrian and use a DOM Parser. Your soul depends on it. – maiorano84 Sep 19 '12 at 02:43
  • thanks jose and maio, i'll try it, but have another way use regular expression to finish it? – Mr.Lak Sep 19 '12 at 02:51
  • Mr.Lak, you will never achieve what you want with regex, it's simply not realistic. You are much better following @JoseAdrians advise. If you're not convinced, read this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Korvin Szanto Sep 19 '12 at 03:50
  • Why would one use @JoseAdrian's suggestion when DOMDocument is in a standard PHP install? – KyleWpppd Sep 21 '12 at 18:44
  • I think he is saying to use a Dom parser... not necessarily the library I suggested. – Jose Adrian Sep 22 '12 at 04:01

2 Answers2

0

Modify Your Code and you get class="_in" in each tr

<?php
$html=file_get_contents('vnair3.txt');
$output=str_replace("<tr","<tr class='_in' ",$html,$count);
//echo $output;
print_r($output);
?>
Man Programmer
  • 5,300
  • 2
  • 21
  • 21
0

First, slurp the HTML into a DOMDocument.

$dom = new DOMDocument::loadHTML($html_string);

Then find all your <TR> elements.

$trs = $dom->getElementsByTagName('tr')

Then iterate over them

foreach($trs as $tr) {
    $classes = $tr->getAttribute('class');
    $classes .= " _tr ";
    $tr->setAttribute('class', $classes);
}

Then export the string

$html = $dom->saveHTML()

For reference: http://www.php.net/manual/en/class.domdocument.php

KyleWpppd
  • 2,010
  • 2
  • 16
  • 16