Python Requests taking a long time

Question

Basically I am working on a python project where I download and index files from the sec edgar database. The problem however, is that when using the requests module, it take a very long time to save the text in a variable (between ~130 and 170 seconds for one file).

The file roughly has around 16 million characters, and I wanted to see if there was any way to easily lower the time it takes to retrieve the text. -- Example:

import requests

url ="https://www.sec.gov/Archives/edgar/data/0001652044/000165204417000008/goog10-kq42016.htm"

r = requests.get(url, stream=True)

print(r.text)

Thanks!

Seems to work under a second for me. Is your computer just slow? `Runtime = 0.239601135254` — itzmurd4, Aug 07 '17 at 17:31
So I just checked my network speed - download speed is 113.03 Mbps and upload is 5.99 Mbps, the cpu I'm using is an amd a10-7700k — Jake Schurch, Aug 07 '17 at 17:40
@Mukul215 does the runtime include the `print(r.text)` statement? — Jake Schurch, Aug 07 '17 at 17:45
@JakeSchurch, Yes it does. https://stackoverflow.com/questions/5622976/how-do-you-calculate-program-run-time-in-python Use the Quick Alternate and check your runtime — itzmurd4, Aug 07 '17 at 17:47
@Mukul215 well..I got similar results to you - due to running the script in cmd prompt rater than my ide (Atom)...is this unusual? — Jake Schurch, Aug 07 '17 at 17:53
Decoding and printing 15MB of data to your console is often slower than loading data from a network connection. Don't print all that data. Just write it straight to a file. — Martijn Pieters, Aug 07 '17 at 17:55
Please don't mark questions as solved. If you have a solution you wish to share, write your *own* answer to your question, and then accept it. — Kuba hasn't forgotten Monica, Aug 07 '17 at 20:22

score 12 · Answer 1 · answered Jan 15 '18 at 20:19

What I found is in the code for r.text, specifically when no encoding was given ( r.encoding == 'None' ). The time spend detecting the encoding was 20 seconds, I was able to skip it by defining the encoding.

...
r.encoding = 'utf-8' 
...

Additional details

In my case, my request was not returning an encoding type. The response was 256k in size, the r.apparent_encoding was taking 20 seconds.

Looking into the text property function. It tests to see if there is an encoding. If there is None, it will call the apperent_encoding function which will scan the text to autodetect the encoding scheme.

On a long string this will take time. By defining the encoding of the response ( as described above), you will skip the detection.

Validate that this is your issue

in your above example :

from datetime import datetime    
import requests

url = "https://www.sec.gov/Archives/edgar/data/0001652044/000165204417000008/goog10-kq42016.htm"

r = requests.get(url, stream=True)

print(r.encoding)

print(datetime.now())
enc = r.apparent_encoding
print(enc)

print(datetime.now())
print(r.text)
print(datetime.now())

r.encoding = enc
print(r.text)
print(datetime.now())

of course the output may get lost in the printing, so I recommend you run the above in an interactive shell, it may become more aparent where you are losing the time even without printing datetime.now()

that's a brilliant answer, thanks! We ran into a similar issue that was randomly occuring. I was able to track it down to response.text and replaced it with response.content because it returned raw bytes and the performance was normal at that point. However, I couldn't find much explanation online. This answer really brings clarity! — Simon Ninon, Aug 13 '20 at 19:34

score 1 · Answer 2 · answered Aug 07 '17 at 22:10

1

From @martijn-pieters

Decoding and printing 15MB of data to your console is often slower than loading data from a network connection. Don't print all that data. Just write it straight to a file.

answered Aug 07 '17 at 22:10

Jake Schurch

135
1
1
8

Python Requests taking a long time

2 Answers2

Additional details

Validate that this is your issue

Linked