0

I use requests package to fetch some earthquake catalog from website: ISC earthquake bulletin

When the content table is small, all good. But when it comes to massive search, or a loop search, I.e., I set different parameters to run the requests in a loop. It will return no available data:

Sorry, but your request cannot be processed at the present time. Please try again in a few minutes.

Can anyone told me how can I avoid too many refused requests?

Here’s my scripts:

# import package
import requests
Url = ‘http://www.isc.ac.uk/cgi-bin/web-db-v4?iscreview=on&out_format=CSV&ttime=on&ttres=on&tdef=on&amps=on&phaselist=&stnsearch=STN&sta_list=CLC&stn_ctr_lat=&stn_ctr_lon=&stn_radius=&max_stn_dist_units=deg&stn_top_lat=&stn_bot_lat=&stn_left_lon=&stn_right_lon=&stn_srn=&stn_grn=&bot_lat=&top_lat=&left_lon=&right_lon=&ctr_lat=&ctr_lon=&radius=&max_dist_units=deg&searchshape=GLOBAL&srn=&grn=&start_year=2009&start_month=7&start_day=01&start_time=00%3A00%3A00&end_year=2019&end_month=8&end_day=01&end_time=00%3A00%3A00&min_dep=&max_dep=&min_mag=6.0&max_mag=6.9&req_mag_type=Any&req_mag_agcy=Any&include_links=on&request=STNARRIVALS’

R.requests(URL)

print(R.text)
Hao Mai
  • 11
  • Perhaps they are blocking you from sending too many requests from a single ip. You could some proxy servers or a VPN to test this. Also waiting using the time module is always an option if this is not a time sensitive task. – Jeremy Savage Aug 27 '21 at 08:22
  • Hi Jeremy, I can’t agree you more. This website definitely have some rules to block my ip if I request too frequently. I understand what you mean about proxy server. But I am not sure if requests package could do so. Besides, about set time module, do you mean set time interval between 2 request sending? – Hao Mai Aug 27 '21 at 08:30
  • Hi Hao, it is possible to use a proxy server with requests, although it can be quite time consuming to understand and set up. https://stackoverflow.com/questions/8287628/proxies-with-python-requests-module – Jeremy Savage Aug 27 '21 at 08:34
  • if you use time.sleep(10) this will force your script to sleep for 10 seconds. Adding these in a for loop will rate limit your requests and potentially resolve your error. You may have to find a good time to sleep for. – Jeremy Savage Aug 27 '21 at 08:36

1 Answers1

0

Use the Retry mechanism of HTTPAdapter to automatically re-send the request when a temporary failure happened. Some settings you maybe interested at:

  1. total - Total number of retries to allow. If the limit is reached without a successful response, then the request is considered a failure.
  2. backoff_factor - Since failed requests usually happens when the server is loaded, it would be beneficial to set a delay in between retries so that the server can breathe. Think of it as the 1st request would happen at the 1st second, the 2nd request at the 2nd second, the 3rd request at the 4th second, the 4th request at the 8th second, the 5th request at the 16th second, and so on up until the configured BACKOFF_MAX.
  3. allowed_methods - The HTTP methods that you want to retry.
  4. status_forcelist - The HTTP status codes that should be retried. Commonly, this is the 5xx series since those are the errors that originated from the server and might be successful if retried.
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

retry_strategy = Retry(
    total=5,
    backoff_factor=0.1,
    allowed_methods=["GET"],
    status_forcelist=[500, 502, 504],
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)

response = http.get("https://google.com")
print(response.status_code)

In this example, if ever the response fails, we are sure that a series of 5 retries was made separated through time by a factor of 0.1 but all still failed. But if the server failure isn't persistent, it is highly likely that this would be successful due to the number of retries made separated through time.

  • Thanks for your detailed teaching @Niel Godfrey Ponciano. It looks really helpful to solve the failure 500 problem. I am not sure if my question fits this case. In my situation, it looks like have multiple troubles, i.e. too many return web content, request too many times, etc. I think some of them will raise status_code=500, but perhaps not all of them. I will test your solution soon. At least it will solve part of the problem. Thanks for your advice! – Hao Mai Aug 28 '21 at 06:52