All,
I have a script i have in place which fetches JSON off of a webserver. Simple as the following:
url = "foo.com/json"
response = requests.get(url).content
data = json.loads(response)
but i noticed is that sometimes instead of returning the JSON object, it will return what looks like a response dump. See here: https://pastebin.com/fUy5YMuY
What confuses me is to how to continue on.
Right now i took the above python and wrapped it
try:
url = "foo.com/json"
response = requests.get(url).content
data = json.loads(response)
except Exception as ex:
with open("test.txt", "w") as t:
t.write(response)
print("Error", sys.exc_info())
Is there a way to catch this? Right now I get a ValueError... and then reparse it? I was thinking to do something like:
except Exception as ex:
response = reparse(response)
but im still confused as to why it will sometimes return the JSON and other times, the header info + content.
def reparse(response):
"""
Catch the ValueError and attempt to reparse it for the json contnet
"""
Can i feed something like the pastebin dump into some sort of requests.Reponse class or similar?
Edit Here is the full stack trace I am getting.
File "scrape_people_by_fcc_docket.py", line 82, in main
json_data = get_page(limit, page*limit)
File "scrape_people_by_fcc_docket.py", line 13, in get_page
data = json.loads(response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 16 column 367717 (char 3 - 368222)
None
In the above code, the response
variable is defined by:
response = requests.get(url).content
which is odd because most of the time, reponse will return a JSON object which is completely parsable.
Ideally, I have been trying to find a way to, when content isnt JSON, pass some how parse it for the actual content and then continue on.