import pandas as pd
import json
import requests
import time
t1=time.time()
df=pd.DataFrame()
while True:
try:
for i in range(110000,160000):
response = requests.get("https://api.postalpincode.in/pincode/{}".format(i))
data = json.loads(response.text)
postOffices = pd.DataFrame(data[0]['PostOffice'])
if not postOffices.empty:
df.append(postOffices, ignore_index=True)
except ConnectionError:
continue
Asked
Active
Viewed 641 times
-3

dspencer
- 4,297
- 4
- 22
- 43

pankit Gujjar
- 9
- 3
-
1Have you tried profiling your code? 50,000 API calls is probably why you think it's "slow". Perhaps the API has some rate-limiting mechanism. – dspencer Apr 03 '20 at 07:12
-
The server will not allow you to make this many requests quickly, because that makes a lot of work for the server and makes it harder for other people to use the website. You should check the API documentation first to see if there is a way to *get the information you want* with fewer requests. – Karl Knechtel Apr 03 '20 at 07:20
-
[This answer](https://stackoverflow.com/questions/487258/what-is-a-plain-english-explanation-of-big-o-notation) might help. We call it Big-O Notation, but basically what @KarlKnechtel is suggesting is whether there's a way to parallelize your requests by grouping those 50,000 calls in a single SQL request (or the `pandas` equivalent). Unfortunately I don't know enough about `pandas` to help – Nathan majicvr.com Apr 03 '20 at 07:45
-
i have heard about asnycio and aiohttp , which can make parallel large number api request , just dont know how to do it. – pankit Gujjar Apr 03 '20 at 08:41
-
can you please help if i can make large number of parallel requests. – pankit Gujjar Apr 03 '20 at 08:41
1 Answers
2
Before hammering a free API service for data scraping purposes, some simple arithmetic would serve you well.
1M requests @ 1ms = 1000s
1M requests @ 50ms = ~14h
etc...
I imagine you'll come up against rate limiting, so maybe you'd be better off scraping the data from the directory listing on their website.

ben
- 385
- 3
- 11