1

I am trying to use requests to pull information from the NPI API but it is taking on average over 20 seconds to pull the information. If I try and access it via my web browser it takes less than a second. I'm rather new to this and any help would be greatly appreciated. Here is my code.

import json
import sys
import requests

url = "https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=&last_name=&organization_name=&address_purpose=&city=&state=&postal_code=10017&country_code=&limit=&skip="

htmlfile=requests.get(url)


data = htmlfile.json()

for i in data["results"]:
    print(i) 
  • That is really weird. it was just working a moment ago. I think I made a typo. This is the working link: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=&last_name=&organization_name=&address_purpose=&city=&state=&postal_code=10017&country_code=&limit=&skip= – Vineeth Bhuvanagiri Oct 10 '17 at 23:10
  • For me it takes about 1.3 seconds curl and python – Paul Rooney Oct 10 '17 at 23:20
  • Is curl a different library than Requests? – Vineeth Bhuvanagiri Oct 10 '17 at 23:24
  • Its a [library and a command line utility](https://curl.haxx.se/) built using that library. I'm using later, no python involved. – Paul Rooney Oct 10 '17 at 23:25
  • I don't think curl is supported by python 3.6. Is requests really that much slower that curl? I like how easy requests is to use. – Vineeth Bhuvanagiri Oct 10 '17 at 23:31
  • Are you sure your browser takes less than a second? Is the underlying TCP connection still open, is anything cached? Etc. And are you including printing in your time? Printing is **slow** – Nick is tired Oct 10 '17 at 23:35
  • Don't worry about curl. I was only using it to show that I was receiving similar request times when using python/requests and when not using them. The implication of that being that python is not causing the slow down you see. – Paul Rooney Oct 10 '17 at 23:35
  • Yes it takes less than a second from my browser. I'm including the printing time but i don't think printing would add another 22 seconds would it? How do I tell if the underlying TCP connection is open or if anything is cached? – Vineeth Bhuvanagiri Oct 10 '17 at 23:38
  • Does it consistently take 20 seconds from python or did it just take that long one time? – Paul Rooney Oct 10 '17 at 23:40
  • It consistently takes about 20 seconds from python. Although for about 3 minutes this afternoon I was able to pull them a lot faster but now its back to being slow. – Vineeth Bhuvanagiri Oct 10 '17 at 23:42
  • So I tried using selenium and that seems to be working. Is there a way I can still use request and keep the time down or should I just try and go with selenium even tho its a bit overkill for what I'm working on – Vineeth Bhuvanagiri Oct 11 '17 at 00:10
  • @VineethBhuvanagiri you will have to dig in and see why requests is slower. You can try enabling logging in requests also look at the tcp exchanges in wireshark. Wireshark isnt exactly beginner friendly but its the best way to pick up on network based latency issues. – Paul Rooney Oct 11 '17 at 00:26

1 Answers1

0

This might be due to the response being incorrectly formatted, or due to requests taking longer than necessary to set up the request. To solve these issues, read on:

Server response formatted incorrectly

A possible issue might be that the response parsing is actually the offending line. You can check this by not reading the response you receive from the server. If the code is still slow, this is not your problem, but if this fixed it, the problem might lie with parsing the response.

  1. In case some headers are set incorrectly, this can lead to parsing errors which prevents chunked transfer (source).
  2. In other cases, setting the encoding manually might resolve parsing problems (source).

To fix those, try:

r = requests.get(url)
r.raw.chunked = True # Fix issue 1
r.encoding = 'utf-8' # Fix issue 2
print(response.text)

Setting up the request takes long

This is mainly applicable if you're sending multiple requests in a row. To prevent requests having to set up the connection each time, you can utilize a requests.Session. This makes sure the connection to the server stays open and configured and also persists cookies as a nice benefit. Try this (source):

import requests
session = requests.Session()
for _ in range(10):
    session.get(url)

Didn't solve your issue?

If that did not solve your issue, I have collected some other possible solutions here.

vauhochzett
  • 2,732
  • 2
  • 17
  • 40