2

I'm running a service which involved a website owner having to put a piece of code onto their website. It doesn't matter which page it goes on.

I'm looking for a way using PHP to check the pages of a given website URL for a piece of code. Could anyone point me in the right direction?

I know how to scan the page of a given URL, but I need a way of collecting all of the page URLs with PHP and searching each one of these pages for a line of code.

Thanks! :)

Frank
  • 1,844
  • 8
  • 29
  • 44

2 Answers2

3

This could be as simple as cURL to get the HTML, and strpos() to see if that specific string is present in the HTML.

However, there are real problems with this! If you are requiring folks to put a link up or something, it is easy to hide that element with their CSS later on by just setting display:none. To get around that, you would need something more advanced to actually check for the existence and visibility of the item. PhantomJS can be used for this.

Now, what happens when folks want to use a minifier or in some other way modify your HTML while keeping in the spirit of adding the link? I suggest not looking for the exact HTML, but something that checks for what you are really looking for... such as a back link to your site. In any case, you can use a DOM parser to help with this problem.

Brad
  • 159,648
  • 54
  • 349
  • 530
0

You want to parse the DOM of each page, search for any links, and then scan those URLs as well. Be sure to keep track of which pages you've already scanned, and which ones you need to, otherwise you'll end up in an infinite loop. You should also set a delay, otherwise you may easily overwhelm a server by sending hundreds of requests directly after each other.

SimpleHTMLDom gives an example of parsing for links.

SpenserJ
  • 802
  • 5
  • 13