0

I am trying to read the following Json file to then paste it into a csv file. The file comes directly from google cloud bigquery (the csv export doesnt work for some reason). I am receiving the mentioned error when running this code:

import json

#prompt the user for a file to import
filename = "C:/Users/abc/Documents/Python Scripts/news_results.json"

#Read JSON data into the datastore variable
with open(filename) as f:
    datastore = json.load(f)

The structure of the Json file consists of article text, category and confidence where the article text contains a full text news article. :

[
  {
    "article_text": "Intel unveils 's photonics technology lab. Writing in the journal Nature,
Dr Paniccia - uter chips,\" he said.\n",
    "category": "/Computers & Electronics/Computer Hardware/Computer Components",
    "confidence": "0.5"
  },
  {
    "article_text": "Rank 'set to by gaming fans.\n",
    "category": "/Computers & Electronics/Consumer Electronics/Game Systems & Consoles",
    "confidence": "0.8399999737739563"
  }
]

The full traceback:

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-9-8fd516f33e36> in <module>()
      6 #Read JSON data into the datastore variable
      7 with open(filename) as f:
----> 8     datastore = json.load(f)

~\AppData\Local\Continuum\anaconda3\lib\json\__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    297         cls=cls, object_hook=object_hook,
    298         parse_float=parse_float, parse_int=parse_int,
--> 299         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
    300 
    301 

~\AppData\Local\Continuum\anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352             parse_int is None and parse_float is None and
    353             parse_constant is None and object_pairs_hook is None and not kw):
--> 354         return _default_decoder.decode(s)
    355     if cls is None:
    356         cls = JSONDecoder

~\AppData\Local\Continuum\anaconda3\lib\json\decoder.py in decode(self, s, _w)
    337 
    338         """
--> 339         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    340         end = _w(s, end).end()
    341         if end != len(s):

~\AppData\Local\Continuum\anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
    355             obj, end = self.scan_once(s, idx)
    356         except StopIteration as err:
--> 357             raise JSONDecodeError("Expecting value", s, err.value) from None
    358         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Edit: using Anaconda Spyder I get the following traceback:

Traceback (most recent call last):

  File "<ipython-input-2-3cca112a7d99>", line 1, in <module>
    runfile('C:/Users/abc/Documents/Python Scripts/Jsonimport.py', wdir='C:/Users/tmeyer/Documents/Python Scripts')

  File "C:\Users\abc\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\abc\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/abc/Documents/Python Scripts/Jsonimport.py", line 15, in <module>
    datastore = json.load(f)

  File "C:\Users\abc\AppData\Local\Continuum\anaconda3\lib\json\__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

  File "C:\Users\abc\AppData\Local\Continuum\anaconda3\lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)

  File "C:\Users\abc\AppData\Local\Continuum\anaconda3\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())

  File "C:\Users\abc\AppData\Local\Continuum\anaconda3\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting value
Tim
  • 161
  • 7
  • 24
  • Yesterday [I learned that BigQuery works with jsonlines](https://stackoverflow.com/a/51595484/4799172). If that's the case in what queries return too, try my answer [here](https://stackoverflow.com/a/46256388/4799172) – roganjosh Jul 31 '18 at 09:09
  • try getting to your file using `os` package, you can replace your filename varibale by `filename = os.path.join(os.getcwd(), 'news_results.json')` if the json file and the script are in the same directory – Mohammed Janati Idrissi Jul 31 '18 at 09:11
  • @JanaTii how would that help? The OP isn't getting a FileNotFound error, they're getting a decode error. – roganjosh Jul 31 '18 at 09:12
  • 1
    I took your json and validated it using jsonlint, getting errors on line two, corrected them and put it inside a string and tested your code here: https://repl.it/repls/FrenchBusyColdfusion As this worked, I assume your json is indeed invalid, do you actually have that `newline` on line 3? – Agash Thamo. Jul 31 '18 at 09:17
  • @roganjosh tried your method and received this error on the last line: TypeError: a bytes-like object is required, not 'str' – Tim Jul 31 '18 at 09:18
  • Python 2/3 issue. Change `with open('the file.json', 'rb') as infile:` to `with open('the file.json', 'r') as infile:` and try please. If that works I will have to update my old answer. In other words, open in `r` mode not `rb` – roganjosh Jul 31 '18 at 09:21
  • @roganjosh Then I get the same error as in my original request. – Tim Jul 31 '18 at 09:24
  • Let me do some testing locally – roganjosh Jul 31 '18 at 09:25
  • @roganjosh Citing "The mode argument is optional; 'r' will be assumed if it’s omitted." - https://docs.python.org/3.3/tutorial/inputoutput.html He originally omitted it, so that cannot be the problem right? – Agash Thamo. Jul 31 '18 at 09:25
  • @AgashThamo. in my my previous answer I linked to I opened the file in `rb` mode. Now I need to look at how to make it work in Python 3 as back then I was using 2.7 and the OP was also reading in `rb` mode there – roganjosh Jul 31 '18 at 09:26
  • @AgashThamo Im currently trying to use the site you linked, but its apparently having some issues. The file comes directly from Google Clouds Bigquery. It is an export from a Table query. I only used Json because the CSV export does not work properly. – Tim Jul 31 '18 at 09:29
  • I can't replicate the issue as long as I don't open the file in `rb` mode. Try `item.replace('\r\n', '')` instead of just replacing `\n` – roganjosh Jul 31 '18 at 09:32
  • Tim thanks for clarifying I missed that point on your question. I just tested locally using a file and the validated json from you and it does work without problems. I cannot replicate it either using your original code or the code of @roganjosh. – Agash Thamo. Jul 31 '18 at 09:34
  • @AgashThamo. Keep in mind that the full json file is much bigger as it contains many full news articles. And what do you mean by validated json? did you do anything to it or is that just the json text I posted saved in a file? – Tim Jul 31 '18 at 09:39
  • You can use [jsonlint](https://jsonlint.com/) and actually, the snippet isn't valid JSON as you've posted, but it looks like you put your own newline in before `"Dr Paniccia"`. Once that string is all on one line it's valid. But I don't see how you'd get invalid JSON back from bigquery. – roganjosh Jul 31 '18 at 09:42
  • Yes I validated it using jsonlint, but as @roganjosh mentioned, I don't think bigquery returns invalid json. – Agash Thamo. Jul 31 '18 at 10:09
  • @AgashThamo. but if the jsons not invalid then my code somehow has to be otherwise I dont see how this could lead to an error. – Tim Jul 31 '18 at 10:46
  • I tested your exact code using a valid json file and it worked. Could you also share your environment details, maybe there's a known bug. – Agash Thamo. Jul 31 '18 at 11:33
  • @AgashThamo. I switched over to use Anaconda Spyder, just to try it in a different environment and got a different error as you can see in my edit above – Tim Jul 31 '18 at 12:08
  • Can you test using console and standard python? There might be issues with the encoding, can you check the encoding of your file too? – Agash Thamo. Jul 31 '18 at 12:30

0 Answers0