0

I know this is probably covered in other threads, but I've been searching all over StackOverflow and tried many solutions, this is why I'm asking.

With this html:

<div class="someclass">
<table>
    <tbody>
        <tr>
            <th class="state">Status</th>
            <th class="name">Name</th>
            <th class="type">Type</th>
            <th class="length">Length</th>
            <th class="height">Height</th>
        </tr>
        <tr>
            <td class="state state2"></td>
            <td class="name"></td>
            <td class="type t18"></td>
            <td class="length">2000 m</td>
            <td class="height"></td>
        </tr>
        <tr>
            <td class="state state1"></td>
            <td class="name"></td>
            <td class="type t18"></td>
            <td class="length">2250 m</td>
            <td class="height"></td>
        </tr>
        <tr>
            <td class="state state1"></td>
            <td class="name"></td>
            <td class="type t18"></td>
            <td class="length">3000 m</td>
            <td class="height"></td>
        </tr>
        <tr>
            <td class="state state2"></td>
            <td class="name"></td>
            <td class="type t18"></td>
            <td class="length">2250 m</td>
            <td class="height"></td>
        </tr>
    </tbody>
</table>
</div>

Now, this is the PHP code I have so far :

$dom = new DOMDocument();
$dom->loadHtmlFile('http://www.whatever.com');
$dom->preserveWhiteSpace = false;

$xp = new DOMXPath($dom);
$col = $xp->query('//td[contains(@class, "state1") and (contains(@class, "state"))]');
$length = 0;

foreach( $col as $n ) {
    $parent = $n->parentNode;
    $length += $parent->childNodes->item(3)->nodeValue; 
}
echo 'Length: ' . $length;

I need to:

1.- Sum the 'length' values so I can echo them, getting rid of the ' m' substring of the given values.

2.- Understand why I'm getting wrong the 'parentNodes', 'childNodes' and 'item()' parts. With many tries I've gotten 'Length: 0'

I know this isn't the place to get a full detailed explanation, but it is really hard to find tutorials targetting these concrete issues. It would be great if someone could give some advice on where I can get this information.

Thanks very much in advance.

Edited the 'Concat' part for simplicity.

John Slegers
  • 45,213
  • 22
  • 199
  • 169
Karls
  • 55
  • 9
  • You have a syntax error on the line where you do the query. – Musa Jan 18 '16 at 12:48
  • Thanks, could you tell me what's that syntax error? Can't see it. – Karls Jan 18 '16 at 12:53
  • You did not escape the `'` in your concat function. – Musa Jan 18 '16 at 12:54
  • Sorry @Musa, ovbiously a beginner here... I still don't see the error. – Karls Jan 18 '16 at 12:57
  • Inside concat you have `' '` , to use single quotes in a single quoted string you have to escape it with a \ (slash) – Musa Jan 18 '16 at 13:02
  • @Musa, my concat sentence comes from http://stackoverflow.com/questions/5662404/how-can-i-select-an-element-with-multiple-classes-with-xpath?lq=1 I'm trying different combinations with slashes, but can't find the right solution. – Karls Jan 18 '16 at 13:24
  • Take a read of this page http://php.net/manual/en/language.types.string.php, if you still cant figure it out I'll just tell you the correct syntax. – Musa Jan 18 '16 at 13:26
  • @Musa, I did read that page before and now, and still can't get it right. – Karls Jan 18 '16 at 13:46
  • `$col=$xp->query('//td[contains(concat(\' \',@class,\' \'), "state1") and (contains(concat(\' \',@class,\' \'), "state")]');` – Musa Jan 18 '16 at 14:04
  • @Musa, this is the first thing I tried, I even did that before your first comment, but this is the result: https://eval.in/503910 – Karls Jan 18 '16 at 14:28
  • I will change the concat to something simpler so I can get answers to my questions – Karls Jan 18 '16 at 14:31

1 Answers1

0

Navigation through DOMDocument for a specified childNode value by using DOMXpath

function getInt($string)
{
    preg_match("/[0-9]+/i", $string, $val);

    $out = 0;
    if (isset($val) && !empty($val))
    {
        $out = $val[0];
    }

    return intval($out);
}

$dom = new DOMDocument();
$dom->loadHtml($html);
$dom->preserveWhiteSpace = false;

$xp = new DOMXPath($dom);
$length = 0;

foreach($xp->query('//td[@class="state state1"]/following-sibling::*[3]') as $element)
{
    $value = $element->nodeValue;
    $length += getInt($value);
}


echo $length;
MasoodRehman
  • 715
  • 11
  • 20
  • thanks. Could you explain that in some detail please? – Karls Jan 18 '16 at 18:43
  • for the given DOM you provide it get the values 3000 m and 2250 m from the node then pass to getInt() function to get the int part from string and return int value one by one in this case two values and add the return value with the current $length value ... – MasoodRehman Jan 18 '16 at 18:46
  • Please check https://eval.in/504064 Notice I've filled every td with values and the results seem to match and sum the first td's for every tr, as per your exact code. – Karls Jan 18 '16 at 18:50
  • it seems it access all child nodes values ok let me dig it more – MasoodRehman Jan 18 '16 at 18:55
  • It sums all lengths, doesn't discriminate between 'state1' and 'state2', etc. Check https://eval.in/504086 – Karls Jan 18 '16 at 19:38
  • Ok which length you need to get and add them. mention your test cases? – MasoodRehman Jan 18 '16 at 19:43
  • If I go with //td[contains(@class, "state1") and (contains(@class, "state")), then length = 0. Check https://eval.in/504097 – Karls Jan 18 '16 at 19:45
  • I need to sum the lengths for td's with class 'state state1' – Karls Jan 18 '16 at 19:47
  • can you add extra class with e.g if so then you problem is solved. – MasoodRehman Jan 18 '16 at 19:47
  • It's an external website page, so nope... :( – Karls Jan 18 '16 at 19:50
  • And finally what we want is exactly the next sibling mean next element and our target element is at position 3 so here we go check the updated code. – MasoodRehman Jan 18 '16 at 20:06
  • Aaand you did it! Thank you very much Masood, really appreciate it. Can I ask you where to find information for similar cases like the one I asked for? I have to get data for many other sites, and each one is different. Many are really easy, but others have caveats like this one. I've searched a lot, but examples are very simple or repetitions of the typical ones. Again, thanks! – Karls Jan 18 '16 at 20:24
  • you can search php site, and off course stackoverflow but it needs your string foundation about language you are working in, you have to understand your core problem digging into it. – MasoodRehman Jan 19 '16 at 04:58
  • like in your current problem the main thing is about to find the 3rd sibling of the find (state state1) element, sibling was what we searching for, the more you digging the more you will learn Happy coding :) – MasoodRehman Jan 19 '16 at 05:03
  • at the end don't forget to vote the answer so other find it useful thanks. – MasoodRehman Jan 19 '16 at 05:03