1

I build a scraper with Python. It works when I place an static URL, but I would like to make a for each URL loop with a JSON file.

This code returns an error, KeyError. I red online that this happens because its an object and not an array. I'm not sure how to fix this. Could someone point me in the right way or maybe even review the code? I placed some screenshots of the errors, the way i look up the JSON info and the way the JSON file is structured.

JSON structure:

enter image description here

from bs4 import BeautifulSoup
import requests
import json

with open("C:\data.json") as my_json:
    json_dict = json.load(my_json)
for website in json_dict[0][0]:
    print("About to scrape: ", website)


print('step 1')
#get url
page_link = website
print('step 2')
#open page
page_response = requests.get(page_link, timeout=1)
print('step 3')
#parse page
page_content = BeautifulSoup(page_response.content, "html.parser")
print('step 4')
#Find info
naam = page_content.find_all(class_='<random class>')[0].decode_contents()
print('step 5')
#Print
print(naam)
Barmar
  • 741,623
  • 53
  • 500
  • 612

1 Answers1

-1

In your json file, the opening bracket of the whole thing is unnamed, try naming it data, then call it with json_dict['data']['url']['loc'][0] (for the url itself)

damaredayo
  • 1,048
  • 6
  • 19
  • Before you add a comment saying it didn't work, I just added a [0] to the end of the expression since I didn't realize you're using an array in the json file (for whatever reason) – damaredayo Sep 26 '18 at 19:27
  • I'm glad I refreshed. :-) Unfortunately, this gave an error as well. $ python c:/Users/Sebastiaan/Desktop/PythonBoek/scraperPython/scraper2.0.py Traceback (most recent call last): File "c:/Users/Sebastiaan/Desktop/PythonBoek/scraperPython/scraper2.0.py", line 8, in for website in json_dict['data']['url']['loc'][0]: KeyError: 'data' I changed my json file like you requested. { "data": { "url": [ { "loc": [ "" ], "changefreq": [ "daily" ] }, – Sebastiaan_L Sep 26 '18 at 19:34
  • I meant in your json doing data = { ... not {data : – damaredayo Sep 26 '18 at 19:36
  • it wouldnt be a valid `json` file if it contained `data={...}` – Craicerjack Sep 26 '18 at 19:46
  • the ... was a placeholder lmao – damaredayo Sep 26 '18 at 19:53
  • @TylerT I dont know understand. **In your json file, the opening bracket of the whole thing is unnamed, try naming it data...** - naming it data would mean its no longer valid json. – Craicerjack Sep 26 '18 at 20:00
  • adding data= breaks the json file. It's not helping, i'm sorry. I know you are trying to help. – Sebastiaan_L Sep 26 '18 at 20:07
  • Can you edit the answer to show precisely what you mean by adding a name? – Barmar Sep 26 '18 at 20:16