1

I am working on data scraping from a website. I download the data.json file from the Network panel of the browser's inspect element. Then read the JSON file locally to store the results. My problem is that I want to make this script fetch automatically this data.json file every couple of hours and record the information.

feed_me_pi
  • 167
  • 1
  • 1
  • 11
  • 1
    Does this answer your question? [Scheduling Python Script to run every hour accurately](https://stackoverflow.com/questions/22715086/scheduling-python-script-to-run-every-hour-accurately) – bertdida Aug 16 '20 at 05:00
  • @bertdida Thank you for your response but I am looking for "how can I fetch that data.json link through my script" which is updated every 15-30 minutes in the network panel. – feed_me_pi Aug 16 '20 at 23:27

2 Answers2

2

Don't try to get anything out of Chrome -- that's unnecessary.

The SPA there is making a call to a metadata url to get the current "directory" (datetime) and then using that directory to lookup the latest interval_generation_data.

This will get you the data every minute. Notice there's no error handling in here so your loop will end the first time you get a 403 or similar.

import requests
import json
import time

s = requests.Session()
s.headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
    'referer': 'https://outagemap.coned.com/external/default.html',
    'x-requested-with': 'XMLHttpRequest'
}

metadata_url = 'https://outagemap.coned.com/resources/data/external/interval_generation_data/metadata.json'
json_url = "https://outagemap.coned.com/resources/data/external/interval_generation_data/"

while True:
    r = s.get(metadata_url, params={'_': int(time.time())})
    directory = r.json()['directory']

    r = s.get(json_url + f'{directory}/data.json', params={'_': int(time.time())})
    print(r.json())
    time.sleep(60)
kerasbaz
  • 1,774
  • 1
  • 6
  • 15
1
import requests
import threading
import time
import datetime

def get_API(N):
    #your instruction goes here (the url request)
    time.sleep(N)#Goes to Sleep after Nth seconds 
    get_API(N) #Then executes after Sleep
t=threading.Thread(target=get_API(4000)) #4000 is seconds and it means it re-runs the get_API function after 1 hr and ~6 min
t.start()
Adv Rap
  • 11
  • 3