-1

I am very new to web scraping and python in general. I am working on a project that requires me to scrape data from a website that refreshes/updates data every 10 minutes. I was able to scrape the data for the current 10 minutes but when the data refreshes the previous data is not valid anymore. I need help with 3 things-

  1. There is an input time stamp at the top of the website. How can I change the time in that input to only fetch data for that particular time period? enter image description here

  2. My current code is -

    import requests
    import pandas as pd
    import datetime as dt
    from datetime import datetime
    
    URL1 = "URL.com"
    
    tables1= pd.read_html(URL1)
    
    print("There are : ",len(tables1)," tables1")
    
    PartUsage=pd.DataFrame(tables1[8])
    
    now=datetime.now()
    PartUsage["Date"]=now
    PartUsage.set_index("Date", inplace=True)
    
    from pathlib import Path  
    filepath = Path('Path.csv')  
    filepath.parent.mkdir(parents=True, exist_ok=True)  
    PartUsage.to_csv(filepath)

I added time stamp because there is no timestamp in the required table. How can I link the time stamp to use that as an input?

This is company specific data and hence I cannot provide the link or any further details. Any help will be appreciated. Thank you

Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
rajat
  • 3
  • 1
  • Well, you can certain wrap (almost) all of that code in `while True:` / ... / `time.sleep(10*60)` to have it repeat every 10 minutes. To send form fields, you will have to go look at the HTML of the file. The times probably get sent as POST parameters. – Tim Roberts Aug 25 '22 at 18:19

1 Answers1

0

You can use Cron app for this. This is an application, that runs some scripts on a specific schedule. You can also deploy it in an auto-running docker container for convenience. More about cron, you can find there: How do I get a Cron like scheduler in Python?

didhat
  • 51
  • 3