Scraping using DomXPath

Question

Using PHP DomXPath to scrape some websites.

Currently using this tutorial to traverse XPaths.

I am currently scraping this site, getting the character names and Steam ID (the mess of an XPath below is what gets one Steam ID).

My question is - there are multiple Steam IDs and character names. The XPath that I painstakingly created only gets one.

How should I scrape all of the Steam IDs instead of just one of them?

$xpath = new DomXPath($this->ourTeamHTML);

/* Set HTTP response header to plain text for debugging output */
header("Content-type: text/plain");

$steamName = $xpath->query('//*[@id="wrapper"]/section/div/div[1]/div[2]/div[2]/div[1]/div/div/div[1]/div/div[1]/h5/b');
/* Traverse the DOMNodeList object to output each DomNode's nodeValue */
foreach ($steamName as $node) {
    echo "Steam Name: " . $node->nodeValue . "\n";
}

har07 · Accepted Answer · 2015-06-13T06:07:39.030

0

Your xpath is too verbose, having full path and element indexes it is not intuitive to read and tends to break due to slight changes in the page source. Try using the following simpler xpath :

//*[@id="wrapper"]//div[@class='col-md-12']//h5/b

It worked for me to get all Steam ID's and character names (total of 32 elements) from the linked page (tested using firefox's firepath add-on)

edited Jun 13 '15 at 06:07

answered Jun 13 '15 at 06:00

har07

88,338
12
84
137

Cool - that sounds like a great too! – theGreenCabbage Jun 13 '15 at 06:05
If I want to store these into a `name` => `SteamID` array, I suppose I could separate the name and Steam ID using a `%2` operator on the array indices? – theGreenCabbage Jun 13 '15 at 06:07
foreach ($steamName as $id => $node) { if($id % 2 == 0) { echo "Steam Name: " . $node->nodeValue . "\n"; } else if ($id % 2 == 1) { echo "Steam ID: " . $node->nodeValue . "\n"; } } – theGreenCabbage Jun 13 '15 at 06:09

Scraping using DomXPath

1 Answers1