0

I´ve to parse a HTML site presenting a call list. After converting to XML the structur is:

<body>
    <form name="mainform" method="POST" action="baz" class="all">
        <input type="submit" value="" style="position:absolute;top:-9999px;left:-9999px;" name="apply"/>
        <p>foo</p>
        <div class="bar">
            ..
        </div>
        <br/>
        <div class="onPageTabsBox">
            <ul class="tabs onPageTabs">
                ...
            </ul>
        </div>
        <table id="baz">
            <tr class="thead">
                ...
            </tr>
        </table>
        <div id="uiScroll">
            <table id="bla">
                <tr class="showif_in">
                    ...
                </tr>
                ...    
                <tr class="showif_out">
                    <td class="call_out" title="outbound call" datalabel="29.12.19 11:13"/>
                    <td>29.12.19 11:13</td>
                    <td title="Doe, John (privat) = 0123456789" datalabel="Name / Rufnummer">
                        <a href=" " onclick="return onDial('0123456789');">Doe, John (privat)</a>
                    </td>
                    <td datalabel="foo">bar</td>
                    <td title="987654 (Internet)" datalabel="own number">987654</td>
                    <td class="duration" data-timestr="0:02" datalabel="duration">2 Min</td>
                    <td class="btncolumn">
                        ...                        
                    </td>
                </tr>
                <tr class="showif_out">
                    ...
                </tr>

Function I need is to get phone numbers from incoming, outgoing, ... calls. So I try to get the phone number(s) from that td node, where title contains " = " The function is at present like this:

function getCallList($config, string $type = '')
{
    ...
    $xmlSite = convertHTMLtoXML($response);
    switch ($type) {
        case 'in':
        case 'out':
        case 'fail':
        case 'rejected':
            $query = sprintf('//form/div/table/tr[@class="showif_%s"]', $type);
            break;
        default:                                   // get all recorded calls
            $query = '//form/div/table/tr';
    }
    $rows = $xmlSite->xpath($query);
    foreach ($rows as $row) {
        $numbers = $row->xpath('substring-after(//td[@title], " = ")');
    }
    ...
}

After consulting similar questions here I tried $numbers = $row->evaluate('substring-after(//td[@title], " = ")'); and several other xPath expressions - unfortunately I can't get the substring. Apart from that, I suspect that it should also be possible to get an array with the phone numbers with just one query.

Black Senator
  • 449
  • 3
  • 11

1 Answers1

1

As mentioned here and here, you unfortunately can't accomplish this in one query with XPath 1.0.

What you could do instead is list all the title attributes belonging to these <td>s, then use preg_match to grab anything that's after an = surrounded by spaces:

$rowTitleAttrs = $xmlSite->xpath('//tr[@class="showif_out"]/td/@title');

$phoneNumbers = [];
foreach ($rowTitleAttrs as $rowTitleAttr) {
  if (preg_match('/(?<= = )(?<phoneNumber>.*?)$/', $rowTitleAttr->title, $matches)) {
    $phoneNumbers[] = $matches['phoneNumber'];
  }
}

I took the liberty of simplifying your XPath query in the process, as a class name should be accurate enough to not have to state the whole path leading to it.

Demo: https://3v4l.org/1oqqA

Jeto
  • 14,596
  • 2
  • 32
  • 46
  • thx for your help. `->query` doesn´t work for me `->xpath` does. The regEx pattern does not matches jet. I try to figure out why. Apart from that, the regular expressions are still as cryptic as xpath for me. – Black Senator Jan 01 '20 at 21:45
  • @BlackSenator Oh you're using the SimpleXML version, missed that (though ideally that's something you should've shared). Anyway, edited my answer, should be better now :) – Jeto Jan 01 '20 at 21:53
  • Sorry for the concealment :); But the RegEx doesn´t match -> empty array. Since now the question is drifting in another direction: should I open it a separate question? – Black Senator Jan 01 '20 at 22:04
  • I just noticed the change from `->value` to `->title`; Runs like a charm :) – Black Senator Jan 01 '20 at 22:12
  • @BlackSenator Happy to hear it. – Jeto Jan 01 '20 at 22:19