-3
import pandas as pd
import json
import requests
import time

t1=time.time()
df=pd.DataFrame()

while True:
    try:
        for i in range(110000,160000):
            response = requests.get("https://api.postalpincode.in/pincode/{}".format(i))
            data = json.loads(response.text)
            postOffices = pd.DataFrame(data[0]['PostOffice'])
            if not postOffices.empty:
                df.append(postOffices, ignore_index=True)
    except ConnectionError:
        continue
dspencer
  • 4,297
  • 4
  • 22
  • 43
  • 1
    Have you tried profiling your code? 50,000 API calls is probably why you think it's "slow". Perhaps the API has some rate-limiting mechanism. – dspencer Apr 03 '20 at 07:12
  • The server will not allow you to make this many requests quickly, because that makes a lot of work for the server and makes it harder for other people to use the website. You should check the API documentation first to see if there is a way to *get the information you want* with fewer requests. – Karl Knechtel Apr 03 '20 at 07:20
  • [This answer](https://stackoverflow.com/questions/487258/what-is-a-plain-english-explanation-of-big-o-notation) might help. We call it Big-O Notation, but basically what @KarlKnechtel is suggesting is whether there's a way to parallelize your requests by grouping those 50,000 calls in a single SQL request (or the `pandas` equivalent). Unfortunately I don't know enough about `pandas` to help – Nathan majicvr.com Apr 03 '20 at 07:45
  • i have heard about asnycio and aiohttp , which can make parallel large number api request , just dont know how to do it. – pankit Gujjar Apr 03 '20 at 08:41
  • can you please help if i can make large number of parallel requests. – pankit Gujjar Apr 03 '20 at 08:41

1 Answers1

2

Before hammering a free API service for data scraping purposes, some simple arithmetic would serve you well.

1M requests @ 1ms = 1000s

1M requests @ 50ms = ~14h

etc...

I imagine you'll come up against rate limiting, so maybe you'd be better off scraping the data from the directory listing on their website.

ben
  • 385
  • 3
  • 11