i had made a screen scraping module which works very fine but with certain limitations.now i want to remove those boundations,but i got so unpredictable and different error. Before anything goes in ur mind let me wat is actually hapening. Initially i used screen scraping to retrieve result for a set of keyword(search content) google's all search engine like co.in/co.uk/nl/de/com.
But now i had to scrape the logic for multiple search engine and multiple keywords in a loop.
Lets check out this with an example:
keyword se company rank
telephony google.co.in airtel 01
telephony google.co.in bsnl 04
telephony google.co.in aircel 06
telephony google.co.in idea 03
mobile op google.co.uk airtel 09
mobile op google.co.uk bsnl 04
and so.. for more than 6 keywords and all shown search engines and for all company.
Initially i was retreiving it for one keyword,se and all company.but now i have to make a list of all keywords,se,company. Simply i used loops to do that.But i faced these errors:
- memory allocated 343322111 bytes overflowed(...[to remove this i used ini_set('memory') func]
- after sum request google used capcha.
To remove capcha i used sleep, or usleep() but it not solving purpose.atlast ERROR: connection reset.
I cant use 30sec or more in usleep func.it will take hours to retreive info.My code search data for 5pages of google, that means 50responses.Lib using
simple_html_dom.php
It works fine for 1page page but not for greater than 3pages.What should i do/use??