33

I wrote a certain API wrapper using Python's requests library.

When it gets a response using requests.get, it attempts to parse as json and takes the raw content if it doesn't work:

resp = requests.get(url, ...)  

try:
    resp_content = resp.json()
except ValueError:
    resp_content = resp.content

return resp_content

This is correct for my purposes. The problem is how long it takes when the downloaded response is an image file, for example, if it is large, then it takes an extremely long time between entering the try, and failing the json parse and entering the except.

(I don't know if it takes super long for the .json() to error at all, or if once it errors it then takes a while to get into the except.)

Is there a way to see if resp is json-parsable without attempting to parse it with .json()? Something like resp.is_json, so I can instantly know which branch to take (resp.json() or resp.content), instead of waiting 30 seconds (large files can take minutes).

Adriaan
  • 17,741
  • 7
  • 42
  • 75
tscizzle
  • 11,191
  • 15
  • 54
  • 88
  • 2
    There is no way of knowing for sure unless you try to actually parse the file. What if the data got corrupted at the very last couple of bytes, rendering the entire JSON invalid? Or, are you asking if there is a way to know if the response *should* contain something that is JSON parsable? – juanpa.arrivillaga May 16 '17 at 22:50
  • Is the wider application just a one-off request or multiple? – roganjosh May 16 '17 at 22:50
  • @juanpa.arrivillaga _should_ would be fine. It's ok if sometimes it is mistaken. – tscizzle May 16 '17 at 22:53
  • Then I believe @dizzyf answer is what you are looking for. – juanpa.arrivillaga May 16 '17 at 22:54
  • @roganjosh Multiple. I'm using this function as part of a larger system which is repeatedly making this request. – tscizzle May 16 '17 at 22:54
  • In which case you might find use in[`requests-futures`](https://github.com/ross/requests-futures) to send async requests, in addition to the answers here. You can set a callback on them. – roganjosh May 16 '17 at 23:00
  • `json.loads()` fails in less than 2 seconds for me if I try to load a 300mb JSON object that's missing the last character. Since `.json()` loads the entire response into memory and caches the value of `.content`, your problem doesn't really make sense to me. What part of this process is slow? Can you make a self-contained test case? – Blender May 16 '17 at 23:02
  • @Blender It is when requesting an Attachment's 'Body' field in the Salesforce REST API. You probably would need to be authenticated with a Salesforce account in order to reproduce, so not worth. – tscizzle May 16 '17 at 23:08
  • @tscizzle: Do you have `simplejson`'s C extensions compiled? You can test this with `import simplejson; print(simplejson._import_c_make_encoder())`. If it prints `None`, you don't have them installed and the JSON decoder is falling back to pure-Python, which is much slower. – Blender May 16 '17 at 23:10
  • Please do not add answers to the question body itself. Instead, you should add it as an answer. [Answering your own question is allowed and even encouraged](https://stackoverflow.com/help/self-answer). – Adriaan Oct 07 '22 at 08:27

6 Answers6

37

Depending on the consistency of the response, you could check if the returned headers include content-type application/json:

resp.headers.get('content-type') == 'application/json'

kolypto
  • 31,774
  • 17
  • 105
  • 99
dizzyf
  • 3,263
  • 19
  • 29
  • 14
    This answer doesn't cover many common cases including when content-type == 'application/json; charset=utf-8' – Daniel Kats Jul 17 '21 at 03:17
  • @DanielKats For `application/json` a charset is actually not allowed. JSON must always be unicode. See also [this comment](https://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean#comment28158510_9254967). – Basti Apr 17 '23 at 20:18
  • 1
    @Basti that may be the case however many API server frameworks will return that content-type by default – Daniel Kats Apr 19 '23 at 22:32
23

You can check if the content-type application/json is in the response headers:

'application/json' in response.headers.get('Content-Type', '')
Duong Le
  • 258
  • 2
  • 6
  • 1
    This will fail if `Content-Type` returns `'application/json; charset=utf-8'` – edepe Sep 08 '21 at 18:30
  • 10
    @edepe He's checking for substring match using the `in` operator, so `'application/json; charset=utf-8'` will work fine. – Cnoor0171 Sep 27 '21 at 02:12
  • 2
    `'application/json' in response.headers.get('Content-Type', '')` in case the response does not have a `Content-Type` header. – nikhilweee Oct 06 '22 at 16:13
5

(Addressing Daniel Kats comment in a previous response)

You could check if the returned headers include Content-Type application/json:

response.headers.get('Content-Type').startswith('application/json')

By using startswith, you're accounting for all allowed formats from https://www.w3.org/Protocols/rfc1341/4_Content-Type.html.

That doesn't guaranty that it will be a valid JSON, but at least that will catch responses which aren't being declared as JSON.

Cœur
  • 37,241
  • 25
  • 195
  • 267
2

Have you tried -

#resp_content = resp.json()
if resp_content.ok:
   resp_content.json()
else:
    resp_content = resp.content 
Laxmikant Ratnaparkhi
  • 4,745
  • 5
  • 33
  • 49
0

If using Session and not directly requests.(METHOD)

from requests import Session
from simplejson.errors import JSONDecodeError

class MySession(Session):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)

    def request(self, *args, **kwargs):
        res = super().request(*args, **kwargs)
        json = res.json
        def wrapper():
            try:
                return json()
            except JSONDecodeError:
                return None
        res.json = wrapper
        return res
        
session = MySession()
res = session.get("https://api64.ipify.org")
if res.json():
    print("ok")
-4

I would check the first few 100 bytes and count the number of json characters like {":. Or you could check for image signatures (JFIF, PNG, GIF89A)..

J. Doe
  • 73
  • 9