0

Introduction:

I'm currently working with JSON and Python, building a web-scraper and anayisig the data. However sometimes the data that I want to scrape does not always get returned due to the IP getting blocked or rerouted or any of the other troubles one can encounter when scraping. So I want to check if the object in the JSON file exists before further processing.

Goal:

To catch if the object was/wasn't grabbed by the scraper using an if statement and searching the JSON file.

What I Have Tried:

  • Using if data["caption"] in data:,
  • Using a for loop for the data,
  • Calling the object directly as if data["caption"]: and if "caption":,
  • For the loop; for s in range(len(data)): and if data[s]["caption"]:

Errors:

  • KeyError: 'caption'
  • TypeError: list indices must be integers or slices, not str

Code:

[{
  "caption": "This is water.",
  "images": [],
  "timestamp": "2022-02-08T03:44:20.000Z",
},
{
  "caption": "old man and the sea",
  "images": [],
  "timestamp": "2022-01-12T17:58:24.000Z",
},
{
  "caption": "from last night",
  "images": [],
  "timestamp": "2021-12-02T15:52:26.000Z",
}]
import json

with open('result.json') as f:
    data = json.loads(f.read())
    for s in range(len(data)):
        if data[s]["caption"]:
          print("Caption is here")
        else:
          print("Caption is not here")

Currently the code will only work if the object 'caption' is in the code, however shows KeyError: 'caption' when it hits a part of the file which does not have 'caption' in it. Any help would be greatly appreciated, thanks!

martineau
  • 119,623
  • 25
  • 170
  • 301
  • Try using `data[s].get("caption")` instead of `data[s]["caption"]`. – Thomas May 12 '22 at 23:37
  • Try using ```data[s].get('caption', None)``` in the ``if`` condition. or You can use try except block and catch the KeyError exception. – Adnan Mohib May 12 '22 at 23:37
  • I've closed your question as a duplicate, but just in case it's not clear how it applies, you want `if "caption" in data[s]:`. If anything's still unclear, LMK. – wjandrea May 12 '22 at 23:38
  • @Muhammad `.get(..., None)` is redundant. You can just use `.get(...)`. – wjandrea May 12 '22 at 23:39
  • @wjandrea I Agree, it's for better readability to indicate the default value of the key is not found. – Adnan Mohib May 12 '22 at 23:41
  • BTW, what you're calling an "object" is actually called a *property* in JS terminology, or a *key* in Python terminology. – wjandrea May 12 '22 at 23:44
  • Also BTW, JSON is irrelevant to the problem since you're loading it just fine; the problem occurs *after* loading it; or in other words, if you put that data directly in your Python script, you'd have the same problem. – wjandrea May 12 '22 at 23:45
  • 1
    @wjandrea Oh man, you guys have just made up for hours of searching. Thank you! – Stephen Flynn May 12 '22 at 23:48
  • Also also BTW, get out of the habit of using `for s in range(len(data)):`. You can just do `for d in data:`, and then `if "caption" in d:`. – wjandrea May 12 '22 at 23:52

0 Answers0