2

I am working on this php base scraper/crawler, which works fine until it get .net generated herf link __doPostBack(...), any idea how to deal with this and crawl page behind those links ?

Widor
  • 13,003
  • 7
  • 42
  • 64
Aman
  • 1,624
  • 3
  • 15
  • 25

1 Answers1

1

Instead of trying to automate clicking the JavaScript button, which requires additional libraries in PHP, try replicating what request is sent by your browser after clicking the button. There are various firefox extensions that will help you examine the request, such as TamperData, Firebug, and LiveHttp.

hoju
  • 28,392
  • 37
  • 134
  • 178
  • humm.. that could be nice idea i didn't really thought about replicating header information, i will give it a try and let you know thanx a lot plumo – Aman Apr 15 '11 at 04:10
  • I tried but this .net form seem to be sending values in header with some kind of encoded format FwEPDwUKMTg4OTUzMTc1MQ9kFgICAw9kFhACBQ8QDxYGHg1EYXR.... – Aman Apr 15 '11 at 04:43
  • so i read the header using php function like apache_request_headers() and then submit that header to retrieve data behind those javascript base link ? will it work, not sure... will give it a go – Aman Apr 21 '11 at 12:56
  • after lot of hit and trial solution i settle with was to read header, get POST data in it and then use curl to send that post to load and scrap page. thanx a lot for the idea! cheers man – Aman Jun 11 '11 at 16:43
  • I did the same thing [here](http://stackoverflow.com/q/42032932/3357517), but I don't know what to do with the response – mrGreenBrown Feb 04 '17 at 06:52