0

I would like to know how to scrape the content of the source code from website using php. I have tried using http://simplehtmldom.sourceforge.net/ and also looked at How do you parse and process HTML/XML in PHP? Im still having hard time trying to get info from the source code. As you can see the main page of the source code contain the link list of author which include the year and the number of books wrote.

<div id="fleft">
    <ul>
    <li><a href="http://www.books.com/john-smith/index.html">John Smith (2011-2012)</a> : 11 books
    <li><a href="http://www.books.com/bobby-bob/index.html">Bobby Bob (2011-2012)</a> : 89 books
    ....
    </ul>
    </div>

I click on john smith it would open the list of books that john smith wrote.

 <h1>John Smith (11 Books)</h1>
    <div id="fleft">

    <ul>
    <li><a href="http://www.books.com/john-smith/best-book.html">Best Book</a>
    <li><a href="http://www.books.com/john-smith/other-best-book.html">Other Best Book</a>
....
    </ul>
    </div>

I click in one of the book "best book" it would show the title of the book and aurther and the whole story of the book.

<div id="bookbox">
<h1>Book : Best Book</h1>

<h2>Aurther : John Smith</h2>
<pre>
story of the best book......
.......
....
the end
</pre>

I would like to be able to grab all the author name and the their year, and list of books, and the content of the book. Actually as dataset. Can someone help me or show me the code sample of php to make this happen. I would like to create a database of the information of all the author's name, year of their lives, books they created, books title, category, books content, etc

Community
  • 1
  • 1
merrill
  • 593
  • 5
  • 14
  • 34

1 Answers1

1

you should mention what approach you are using to get html of target page, i suppose that you have html of target page in $targetHTML variable

you cand load it in dom like this

/*********** Load In Dom *********/
$html = new DOMDocument;
$html->loadHTML($targetHTML);
$xPath = new DOMXPath($html);
/*********** Load In Dom *********/

you can use xpath to fetch your desired data from html loaded in dom.

If you are using this approach already you can show your code to find out problem.

Regards

CoreCoder
  • 389
  • 1
  • 4
  • 14
  • Sorry Im still lost on that. Can you give me more samples please. Im really a beginner in PHP. – merrill Oct 26 '11 at 00:16
  • I would like to know how to code by using php to load the html, pass in the html to create a Dom document, use that document to create a dom xpath... traverse the dom xpath to create an array of authors. – merrill Oct 26 '11 at 00:45
  • would this code be easier/better which I posted in here...http://stackoverflow.com/questions/7911095/php-how-to-store-list-of-author-in-array-dictionary-web-scraper – merrill Oct 27 '11 at 02:29