1

I'm having some issues with Deadline Exceeded error. Basically I'm doing some webscraping in an URL using Mechanize. So when trying to perform

br.open(url)

I have this error

HTTPException: Deadline exceeded while waiting for HTTP response from URL: my-url

I have read the documentation where it says to use Backends (I'm using a dynamic backend, B4_1G class with 5 instances), but still having this error happening in 60 seconds. And according to the docs, when using TaskQueue and Backends the timeout should be extended to 10 minutes.

Here is how I assign the operation to be runnnig on a TaskQueue with it's target on the first instance of my Backend.

taskqueue.add(url='/crons/myworker', target='1.myworker')

Here is the backends.yaml.

backends:
- name: myworker
  class: B4_1G
  instances: 5
  options: dynamic

Any ideas of what might be happening? Thank you.

rogcg
  • 10,451
  • 20
  • 91
  • 133

1 Answers1

5

No request that involves getting data via HTTP can take more then 60 seconds on app engine.

The 10 minute limit refers to the tasks themselves - they can run for up to 10 minutes.

So GAE might not be the best choice here as you can only use it's provided versions of urlfetch etc, if your requests are going to take longer then 60 seconds on average anyway.

You can set a deadline for a request, the most amount of time the service will wait for a response. By default, the deadline for a fetch is 5 seconds. The maximum deadline is 60 seconds for HTTP requests and 10 minutes for task queue and cron job requests.

https://developers.google.com/appengine/docs/python/urlfetch/

So a task can run for up to 10 minutes and a url fetch for (max) 60 seconds. It does not matter where you perform the urlfetch operation from, a front or backend, the limit is the same.

Paul Collingwood
  • 9,053
  • 3
  • 23
  • 36
  • I appreciate your help, you really took off some doubts I had. But, when you say **"So GAE might not be the best choice here as you can only use it's provided versions of urlfetch"**, you say I should use another cloud service? Isn't there another aproach to perform such operations? The problem is not that the query lates more than 60s. The link runs fine most of the time, but some requests last long than 60s, this is the problem I'm facing here. IDK, maybe their servers are slow, etc. So isn't there any approach, maybe retry after the error happens. Thanks. – rogcg Dec 27 '13 at 18:37
  • 1
    Yeah, my understanding is 60 seconds is a hard limit for those requests and there's no way around it. So if that's going to be a problem, change now! :P – Paul Collingwood Dec 27 '13 at 18:44
  • 1
    Sure, retying failed queries might well be the way to do it and probably a good idea in any case. – Paul Collingwood Dec 27 '13 at 18:45