i'm new to web apps so i'm not so used to worrying about CPU limits, but i looks i am going to have a problem with this code. I read in google's quotas page that i can use 6.5 CPU hours per day an 15 CPU , minutes per minute.
Google Said:
CPU time is reported in "seconds," which is equivalent to the number of CPU cycles that can be performed by a 1.2 GHz Intel x86 processor in that amount of time. The actual number of CPU cycles spent varies greatly depending on conditions internal to App Engine, so this number is adjusted for reporting purposes using this processor as a reference measurement.
And
Per Day Max RateCPU Time 6.5 CPU-hours 15 CPU-minutes/minute
What i want to Know:
Is this script going over the limit?
(if yes)How can i make it not go over the limit?
I use the urllib library, should i use Google's URL Fetch API? Why?
Absolutely any other helpful comment.
What it does:
It scrapes (crawls) project free TV. I will only completely run it once then replace it with a shorter faster script.
from urllib import urlopen
import re
alphaUrl = 'http://www.free-tv-video-online.me/movies/'
alphaPage = urlopen(alphaUrl).read()
patFinderAlpha = re.compile('<td width="97%" nowrap="true" class="mnlcategorylist"><a href="(.*)">')
findPatAlpha = re.findall(patFinderAlpha,alphaPage)
listIteratorAlpha = []
listIteratorAlpha[:] = range(len(findPatAlpha))
for ai in listIteratorAlpha:
betaUrl = 'http://www.free-tv-video-online.me/movies/' + findPatAlpha[ai] + '/'
betaPage = urlopen(betaUrl).read()
patFinderBeta = re.compile('<td width="97%" class="mnlcategorylist"><a href="(.*)">')
findPatBeta = re.findall(patFinderBeta,betaPage)
listIteratorBeta = []
listIteratorBeta[:] = range(len(findPatBeta))
for bi in listIteratorBeta:
gammaUrl = betaUrl + findPatBeta[bi]
gammaPage = urlopen(gammaUrl).read()
patFinderGamma = re.compile('<a href="(.*)" target="_blank" class="mnllinklist">')
findPatGamma = re.findall(patFinderGamma,gammaPage)
patFinderGamma2 = re.compile('<meta name="keywords"content="(.*)">')
findPatGamma2 = re.findall(patFinderGamma2,gammaPage)
listIteratorGamma = []
listIteratorGamma[:] = range(len(findPatGamma))
for gi in listIteratorGamma:
deltaUrl = findPatGamma[gi]
deltaPage = urlopen(deltaUrl).read()
patFinderDelta = re.compile("<iframe id='hmovie' .* src='(.*)' .*></iframe>")
findPatDelta = re.findall(patFinderDelta,deltaPage)
PutData( findPatGamma2[gi], findPatAlpha[ai], findPatDelt)
If I forgot anything please let me know.
Update:
This is about how many times it will run and why in case this is helpfull in answering the question.
per cycle total
Alpha: 1 1
Beta: 16 16
Gamma: ~250 ~4000
Delta: ~6 ~24000