-1

I am using a requests.get command to retrieve data from a URL/REST endpoint. The data is supposed to be in JSON format. When I access the URL from a browser, it has the formatting you would expect, namely [{"variable1":value,"variable2":value}]. However, when I use the requests.get command to grab the data, the returned data is broken up and includes additional lines of values, which appear to be hexadecimals that represent the number of characters on the subsequent line. Below is an example of what response.text returns.

 2000

 [{"variable1":v

 2000

 alue,"variable2

 1ecd

 ":value}]

 0

2000 base 16 = 8192 base 10, meaning there are 8,192 characters on the subsequent line. Note that in the above example, I purposely shortened the lines to not show 8,192 characters :).

Needless to say, this does not adhere to JSON format, so I cannot process it as expected. Any thoughts on why this may be happening? My guess was that it may have something to do with the size of the response received - len(response.content) indicates that it is 335,898 bytes in size - so I tried chunking the response as described here, but that did not affect the output of response.text. Appreciate any thoughts people can share :)

EDIT1: Running print(response.json()) yields the below error:

Traceback (most recent call last):
  File "IncomingMessagesCompanyIDEvent.py", line 105, in <module>
    data = getData()
  File "IncomingMessagesCompanyIDEvent.py", line 65, in getData
    text_file.write(response.json())
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\models.py", line 858, in json
self.content.decode(encoding), **kwargs
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 6)

EDIT2: Issue resolved. See below for my answer.

Community
  • 1
  • 1
skyfx
  • 29
  • 7

3 Answers3

1

Have you tried response.json() instead of response.text?

cullywest
  • 121
  • 4
  • See the edit to my original post. I assume this error is caused because the response received from the GET command is not in JSON format. – skyfx Apr 05 '17 at 14:49
  • Can you filter out the non-json stuff using `response.iter_lines()`? – cullywest Apr 05 '17 at 17:31
  • Unfortunately, the issue is not just the non-json stuff, but the fact that the data is split up across lines. I've updated the example in my original post to reflect this. – skyfx Apr 05 '17 at 19:30
0

It looks like you have some very badly formed json in there, first thing you should do is:

import re

response = response.text
#filter out the newlines from literal
response = re.sub('\n','',response)
filtered_response = re.search('\[.*\]',response).group(0)
#We still have a bad json string - '[{"variable1":v 2000 alue,"variable2 1ecd ":value}]', values should be quoted not only keys.

Now you have the approximate value, you could go around playing with regex and quoting the values, but you should not have to do it. So if you control the string source, try to quote values first.

Ilhicas
  • 1,429
  • 1
  • 19
  • 26
0

The issue has been resolved. It appears to have been related to the version of Python. I switched from using v3.5.1 to using v2.7.13, which fixed the issue. The JSON data is now returned in a single string without added hexadecimals.

skyfx
  • 29
  • 7