2

I just started to use PHP Simple HTML DOM Parser.

Now I'm trying to extract all elements surrounded with a <b>-tag inclduing </b> from an exsiting HTML document. This works fine with

foreach($html->find('b') as $q)
    echo $q;

How can I achieve to show up only elements surrounded with the <b>,</b>-tags followed by a <span class="marked">?

Update: I've used firebug to get the css path for the elements. Now it looks like this:

foreach ($html->find('html body div#wrapper table.desc tbody tr td div span.marked') as $x)
    foreach ($x->find('html body div#wrapper table.desc tbody tr td table.split tbody tr td b') as $d)
        echo $d;

But it won't work... Any Ideas?

Update:

To clarify my question here a sample tr of the document with starting table and ending table tags.

<table width="100%" border="0" cellspacing="0" cellpadding="0" class="desc">
    <tr>
        <th width="25%" scope="col"><div align="center">1</div></th>
        <th width="50" scope="col"><div align="center">2</div></th>
        <th width="10%" scope="col"><div align="center">3</div></th>
        <th width="15%" scope="col"><div align="center">4</div></th>
    </tr>
    <tr>
        <td valign="top" bgcolor="#E9E9E9"><div style="text-align: center; font-weight: bold; margin-top: 2px"> 1 </div></td>
        <td>
            <table width="100%" border="0" cellspacing="0" cellpadding="0" class="split">  <tr>
                    <td>
                        <b> element to extract</b></td>
                </tr>
                <tr>
                    <td>
                        <table width="100%" border="0" cellspacing="0" cellpadding="0" class="split">  <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        1
                                    </div>
                                </td>
                                <td>
                                    abed
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        2
                                    </div>
                                </td>
                                <td>
                                    ddee
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        3
                                    </div>
                                </td>
                                <td>
                                    xdef
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        4
                                    </div>
                                </td>
                                <td>
                                    abbcc
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        5
                                    </div>
                                </td>
                                <td>
                                    ab
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        6
                                    </div>
                                </td>
                                <td>
                                    e1
                                </td>
                            </tr>
                        </table>
                    </td>
                </tr>
            </table>
        </td>
        <td valign="top"><div style="text-align: center"> <span class="marked">marked</span> </div></td>
        <td valign="top"><div style="text-align: center">  </div></td>
    </tr>
</table>
vbd
  • 3,437
  • 4
  • 32
  • 45

2 Answers2

3

Try the following CSS selector

b > span.marked

That would return the span though, so you probably have to do $e->parent() to get to the b element.

Also see Best Methods to parse HTML for alternatives to SimpleHtmlDom


Edit after update:

Your browser will modify the DOM. If you look at your markup, you will see that there is no tbody elements. Yet Firebug gives you

html body div#wrapper table.desc tbody tr td div span.marked'
html body div#wrapper table.desc tbody tr td table.split tbody tr td b'

Also, your question does not match the queries. You asked how to find

elements surrounded with the <b>,</b>-tags followed by a <span class="marked">

That can be read to either mean

<b><span class="marked">foo</span></b>

or

<b><element>foo</element></b><span class="marked">foo</span>

For that first use the child combinator I have shown earlier. For the second, use the adjacent sibling combinator

b + span.marked

to get the span and then use $e->prev_sibling() to return the previous sibling of element (or null if not found).

However, in your shown markup, there is neither nor. There is only a DIV with a SPAN child having the marked class

<div style="text-align: center"> <span class="marked">marked</span>

If that is what you want to match, it's the child combinator again. Of course, you have to change the b then to a div.

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • @Eray I dont know up to what level SimpleHtmlDom implements them though. And tbh, I dont see why I would need them (or SimpleHtmlDom) when I can use DOM and XPath :) – Gordon Jan 26 '11 at 14:10
  • FTR, simplehtmldom doesn't support sibling selectors, but [some alternatives do.](http://scraperblog.blogspot.com/2012/11/choosing-php-html-parser.html) – pguardiario Nov 06 '12 at 00:27
-1

More simple is from manual:

foreach($html->find('b') as $q)
    echo $q->plaintext;
Anton
  • 1