-1

I have created a JSON file from my Google search API results. I'm trying to read the file and parse the objects.

Each search result is one JSON array, which is shown below. I have 200 of these arrays in a single JSON file.

{
  "kind": "customsearch#result",
  "title": "text here",
  "htmlTitle": "text here",
  "link": "link here",
  "displayLink": "text here",
  "snippet": "text here",
  "htmlSnippet": "text here",
  "cacheId": "ID string",
  "formattedUrl": "text here",
  "htmlFormattedUrl": "link here",
  "pagemap": {
  "metatags": [
    {
      "viewport": "width=device-width, initial-scale=1"
    }
  ],
  "Breadcrumb": [
    {
      "title": "text here",
      "url": "link here",
    },
    {
      "title": "text here",
      "url": "link here",
    },
    {
      "title": "text here",
      "url": "link here",
    },
    {
      "title": "text here",
      "url": "link here",
    }
  ]
}

I'm having an issue reading the JSON file into json.load(s).

How do I read this file and start parsing the items?

def ingest_json(input):
try:
    with open(input, 'r', encoding='UTF-8') as f:
        json_data = json.loads(f)
except Exception:
    print(traceback.format_exc())
    sys.exit(1)

throws this error:

TypeError: the JSON object must be str, 
bytes or bytearray, not 'TextIOWrapper'

def ingest_json(input):
try:
    with open(input, 'r', encoding='UTF-8') as f:
        json_data = json.load(f)
except Exception:
    print(traceback.format_exc())
    sys.exit(1)

throws this error:

 raise JSONDecodeError("Extra data", s, end)
                   json.decoder.JSONDecodeError: Extra data: line 269 
                   column 2 (char 10330)
Life is complex
  • 15,374
  • 5
  • 29
  • 58
  • 1
    first code: use `load` instead `loads` – eyllanesc Aug 25 '18 at 17:52
  • your json is not valid, use https://jsonlint.com/ to verify that your json is correct, for example: `"url": "link here",}`that comma is not correct, remove it. – eyllanesc Aug 25 '18 at 17:54
  • Possible duplicate of https://stackoverflow.com/questions/44035799/cant-read-json-file-with-python-getting-type-error-json-object-is-textiowrap – prithajnath Aug 25 '18 at 17:56

1 Answers1

3

In json.loads(), the 's' stands for string so it only works on the string type.

json.load() is definitely the method you want, although it is very particular about the json being well formatted, and according to the specification a single JSON file can only contain a single JSON object.

Try splitting the data into multiple files, each with a single object, or split the string by object in your python before parsing. Also, check out Can json.loads ignore trailing commas? for handling the trailing commas problem.

Kaeden Wile
  • 101
  • 1
  • 5