0

I have recently been interested in data and JSON files. I uploaded a file named "sarcasm.json" onto the computer and coded this out.

import json

with open("sarcasm.json", 'r') as f:
  datastore = json.load(f)

sentences = []
urls = []
labels = []

for item in datastore:
  sentences.append(item["headline"])
  labels.append(item["is_sarcastic"])
  urls.append(item["article_link"])
#Note that this code was created by using the "NLP Zero to Hero Course in Machine Learning"

However, after running the code, I got an error saying:

Traceback (most recent call last):
  File "main.py", line 6, in <module>
    datastore = json.load(f)
  File "/nix/store/p21fdyxqb3yqflpim7g8s1mymgpnqiv7-python3-3.8.12/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/nix/store/p21fdyxqb3yqflpim7g8s1mymgpnqiv7-python3-3.8.12/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/nix/store/p21fdyxqb3yqflpim7g8s1mymgpnqiv7-python3-3.8.12/lib/python3.8/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 208)

I have checked for typos but have currently found none. Could someone please point out where I went wrong?

The JSOn does seem like to be the problem but when I check line 2 column 1, I don't see any problem, here are the first 3 lines of the JSON code:

{"is_sarcastic": 1, "headline": "thirtysomething scientists unveil doomsday clock of hair loss", "article_link": "https://www.theonion.com/thirtysomething-scientists-unveil-doomsday-clock-of-hai-1819586205"}
{"is_sarcastic": 0, "headline": "dem rep. totally nails why congress is falling short on gender, racial equality", "article_link": "https://www.huffingtonpost.com/entry/donna-edwards-inequality_us_57455f7fe4b055bb1170b207"}
{"is_sarcastic": 0, "headline": "eat your veggies: 9 deliciously different recipes", "article_link": "https://www.huffingtonpost.com/entry/eat-your-veggies-9-delici_b_8899742.html"}

Thank you, Paul10

Paul10
  • 33
  • 6
  • 2
    It's a problem with the JSON file, not your Python code. You're showing us the wrong things entirely. – jasonharper Mar 25 '22 at 02:12
  • 1
    The error is telling you, very explicitly, that the problem has to do with *the contents of the file*, not your code. – Karl Knechtel Mar 25 '22 at 02:12
  • 2
    it is not JSON but muli-JSON. Every line is separated JSON. You have to convert every line separatelly – furas Mar 25 '22 at 02:14
  • 1
    That format is also called JSONL (I've never heard "multi-JSON" used as a term). You can find a JSONL library for Python, or you can just load each line as a distinct document. – Charles Duffy Mar 25 '22 at 02:17
  • 2
    The linked duplicate, https://stackoverflow.com/questions/50475635/loading-jsonl-file-as-json-objects, has answer showing several working options. – Charles Duffy Mar 25 '22 at 02:19
  • you can run `for`-loop - like `for line in f: all_data.append( json.loads(line) )` but first create list `all_data = []` and it will need different loop to split to separated list. – furas Mar 25 '22 at 02:19

0 Answers0