0

I'm having trouble with DOMDocument/XPath. The HTML (I have no control of it) looks like this:

.. random html ..

<div class="separator"></div>
<div class="date">01-01-1900</div>

<div class="item"><div>1 HTML garbage</div></div>
<div class="item"><div>2 HTML garbage</div></div>


<div class="separator"></div>
<div class="date">12-12-2012</div>

<div class="item"><div>3 HTML garbage</div></div>
<div class="item"><div>4 HTML garbage</div></div>
<div class="item"><div>5 HTML garbage</div></div>
<div class="item"><div>6 HTML garbage</div></div>

.. more random html ...

How I need my data:

$result = array(
    '01-01-1900' => array(
        array('name' => '1 HTML garbage'),
        array('name' => '2 HTML garbage')
    ),
    '12-12-2012' => array(
        array('name' => '3 HTML garbage'),
        array('name' => '4 HTML garbage'),
        array('name' => '5 HTML garbage'),
        array('name' => '6 HTML garbage')
    )
);

Since the depth can change, I can't use a fixed path from my browser console. How I can group by date? Right now I can get a list of the items by using:

$xpath->query('//*[contains(concat(" ", normalize-space(@class), " "), " item ")]');

1 Answers1

0

Since you are using php you can first get all dates and iterate over those dates to get the items according to this (untested)

//../node[contains(@class,'item') and preceding-sibling::node[contains(text(),'12-12-2012')]]

with 12-12-2012 as the searched value.

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • The "date" field on my example is following a standard but real data isn't, so I can't really loop that. I could query the entire HTML for all the datas and them loop that (like it was a random string) but this seems extremely inefficient. Is this really the only way of doing it on xpath? –  May 03 '14 at 19:31
  • Thanks man, I made it work with your example. If anybody else is facing the same problem: this is way too hack-ish and I wouldn't recommend unless you have to deliver something for yesterday. –  May 03 '14 at 19:48