0

All,

I have a script i have in place which fetches JSON off of a webserver. Simple as the following:

url = "foo.com/json"
response = requests.get(url).content
data = json.loads(response)

but i noticed is that sometimes instead of returning the JSON object, it will return what looks like a response dump. See here: https://pastebin.com/fUy5YMuY

What confuses me is to how to continue on.

Right now i took the above python and wrapped it

try:
    url = "foo.com/json"
    response = requests.get(url).content
    data = json.loads(response)
except Exception as ex:
    with open("test.txt", "w") as t:
        t.write(response)
    print("Error", sys.exc_info())

Is there a way to catch this? Right now I get a ValueError... and then reparse it? I was thinking to do something like:

except Exception as ex:
    response = reparse(response)

but im still confused as to why it will sometimes return the JSON and other times, the header info + content.

def reparse(response):
    """
    Catch the ValueError and attempt to reparse it for the json contnet
    """

Can i feed something like the pastebin dump into some sort of requests.Reponse class or similar?

Edit Here is the full stack trace I am getting.

File "scrape_people_by_fcc_docket.py", line 82, in main
    json_data = get_page(limit, page*limit)
File "scrape_people_by_fcc_docket.py", line 13, in get_page
    data = json.loads(response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 16 column 367717 (char 3 - 368222)
 None

In the above code, the response variable is defined by:

response = requests.get(url).content

which is odd because most of the time, reponse will return a JSON object which is completely parsable.

Ideally, I have been trying to find a way to, when content isnt JSON, pass some how parse it for the actual content and then continue on.

Fallenreaper
  • 10,222
  • 12
  • 66
  • 129
  • The pastebin image is really does not look like a request dump. It is a reasonable response from an API. – Arpit Solanki Jul 15 '17 at 21:15
  • When you do json.loads value error comes? If so then post full traceback – Arpit Solanki Jul 15 '17 at 21:16
  • @ArpitSolanki usually it is just returning me JSON to manipulate. When I hit this Error it says: *Value Error, Extra data line A column A - ling B column B*. The differences I have noticed was that `x.content` would return pure JSON and now it returns the header data as well. Which is unusual.. It returns the error similar to: https://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data but instead it points to my `json.loads()` function – Fallenreaper Jul 15 '17 at 21:18

1 Answers1

0

instead of using .text or .content you can use the response method: .json() which so far seems to resolve my issues. I am doing continual testing and watching for errors and will update this as needed, but it seems that the json function will return the data i need without headers, and similarly already calls json.loads or similar to parse the information.

Fallenreaper
  • 10,222
  • 12
  • 66
  • 129