I have a screen scraping script in PHP on a GoDaddy shared LAMP server running via command-line.
The script scrapes, parses and stores the required information in a database. It takes about 1.5 seconds for the entire process per page, and needs to scrape close to 10,000 pages (and for each of the pages, fetch cookies from two others, making it a total of 30k pages that are curl
ed).
The entire script will take about 5 hours to run. I have done some memory profiling, and memory consumption stays more or less constant throughout the run - it does not increase.
If I were to run the script overnight, would GoDaddy notice something abnormal about it? CPU consumption should not be too much but how bad would the bandwidth consumption of fetching 3 pages per 1.5 seconds for a duration of 5 hours be? Enough to raise alarms on GoDaddy's end?
If yes, I suppose I could break up the script to run through 1500 pages, and then halt for one hour and then resume. Should I do that?