0

I have to read information from an HTML page and transfer it to multiple arrays for further processing. My approaches with xpath have not been so successful that I had access to the data I wanted.

The body section contains a table with different numbers of lines, as in the following example:

...
</tr>
<tr>
    <td class="name" title="43PUS6551" datalabel="43PUS6551">
        <span>43PUS6551</span>
    </td>
    <td datalabel="Internetnutzung" class="usage">eingeschränkt</td>
    <td datalabel="Onlinezeit heute" class="bar time">
        <span title="03:20 von 14:00 Stunden">
            <span style="width:23.81%;"/>
        </span>
    </td>
    <td datalabel="Zugangsprofil" class="profile">
        <select name="profile:user6418">
            <option value="filtprof1">Standard</option>
            <option value="filtprof3">Unbeschränkt</option>
            <option value="filtprof4">Gesperrt</option>
            <option value="filtprof5334">Network</option>
            <option value="filtprof5333" selected="selected">Stream</option>
            <option value="filtprof4526">X-Box_One</option>
        </select>
    </td>
    <td datalabel="" class="btncolumn">
        <button type="submit" name="edit" id="uiEdit:user6418" value="filtprof5333" class="icon edit" title="Bearbeiten"/>
    </td>
</tr>
<tr>
...

I need one array, which contains the title attribute from line 2 as key and gets the attribute name from the <select> section (line 12) as value.

$devices = [
    '43PUS6551' => 'profile:user6418'
    …
]

I started with this and I´m able to receive the keys for this array:

    $dom = new DOMDocument();
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($response);
    $xmlSite = simplexml_import_dom($dom);

    $devices = [];
    $rows = $xmlSite->xpath('//tr/td[@title=@datalabel]');
    foreach ($rows as $row) {
        $key = utf8_decode((string)$row->attributes()['title']);

But now I'm struggling to get the designated value. I tried different ways: upwards with parent and back down to the node <select> or with following-sibling. But I'm too stupid to use the xpath synthas properly.

If I accomplished that, I need an array which contains the attribute name from the <select> section (line 12) as key and the attribute value from the <option> section which is also selcted as value.

$filters = [
    'profile:user6418' => 'filtprof5333'
    …
]

Finally, I need one array containing the data from the <option> section (appears in every row):

$profiles = [
    'Standard' => 'filtprof1',
    'Unbeschränkt' => 'filtprof3,
    …
    'X-Box-One' => 'filtprof4526',
]

Any help for propper xpath-hints will be appreciated

Black Senator
  • 449
  • 3
  • 11
  • [PHP DomDocument](https://www.php.net/manual/en/class.domdocument.php) may be what you are looking for. – Martin Chuka Jul 31 '19 at 23:21
  • Could I have tried xpath without PHP DOMDocument? – Black Senator Aug 01 '19 at 12:41
  • Well at this point, it depends on your code, you didn't put any code here though. Take a look at this answer, might help [Difference between simplexml and Dom](https://stackoverflow.com/questions/4803063/whats-the-difference-between-phps-dom-and-simplexml-extensions) – Martin Chuka Aug 01 '19 at 13:24

2 Answers2

0

Try it:

preg_match_all('/\<option value\="([a-z0-9]+)">([A-Za-z0-9\_\-]+)\<\/option\>/', $str, $match, PREG_SET_ORDER);
$profiles = array();
foreach($match as $row) {
  $profiles[$row[2]] = $row['1'];
}
print_r($profiles);
  • To be honest, I do not like preg_match very much - especially not if, as in this case, I'm not sure what content the requested website returns to me. Therefore, I would rather realize that with xpath. – Black Senator Aug 01 '19 at 12:24
0

The following functions as desired for me:

    // convert html response into SimpleXML
    $dom = new DOMDocument();
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($response);
    $xmlSite = simplexml_import_dom($dom);

    // initialize processing values
    $devices = [];
    $options = [];
    $filters = [];

    // parse SimpleXML with xpath to get current data
    $rows = $xmlSite->xpath('//tr/td[@title=@datalabel]');  // these are the rows with assignments of devices to filters
    foreach ($rows as $row) {
        $key = utf8_decode((string)$row->attributes()['title']);    // name (label) of the devices
        if (preg_match('/Alle /', $key)) {                          // skip standard settings
            continue;
        }
        $select = $row->xpath('parent::*//select[@name]');  // find the line with the currently assigned ID for the device
        $value = (string)$select[0]->attributes()['name'];  // get the current ID ('profile:user*' or 'profile:landevice*')
        $devices[$key] = $value;

        $options = $select[0]->xpath('option');             // the defined filters (dropdown in each row)
        foreach ($options as $option) {
            $profiles[utf8_decode((string)$option)] = (string)$option->attributes()['value'];   // get label and ID of filters
            if (isset($option->attributes()['selected'])) {     // determine the filter currently assigned to the device
                $filters[$value] = (string)$option->attributes()['value'];  // get device (ID) and filter (ID)
            }
        }
    }
Black Senator
  • 449
  • 3
  • 11