-1

I want to automate the data download process(via Python) from specific websites but cannot use Selenium or a browser, since the code would be scheduled on a server where both selenium and browser options are not available.

I tried a python code using pyautogui package to automate mouse movement n click to download the file, but it wouldn't work since server would not allow opening a browser option (and i can't use Selenium as well, which would be the ideal option)

attaching the code below:

import time

import webbrowser

url = 'https://covid.cdc.gov/covid-data-tracker/#ed-visits'

#Open URL in a new tab, if a browser window is already open*

webbrowser.open_new_tab(url)

print(pyautogui.size())

time.sleep(5)

pyautogui.moveTo(1275, 655, duration = 5)

pyautogui.click()

time.sleep(5)

pyautogui.press('down')

pyautogui.press('enter')

I would like a bit of help on the other ways this can be achieved. Given the above limitations, how can i automate the file download, such that I run/schedule a .py file on the server side to automate this process.

I tried to follow @Olvin Roght's answer but could not find the triggered function or the file source for the csv : Is there any way to download csv file from “website button click” using Python?

Attaching the snap of that as well:

Download_Button_Inspect_Element_snap

2 Answers2

0

With respect to @epascarello's comment above, it looks like the data is generated on the client side through Javascript.

How do I know this? Using Chrome's developer console, I switched to the "Network" tab, clicked the "Download CSV" button, and observed what it did - or rather, what it didn't do. It didn't make a request to the server for a CSV file.

This means that Selenium/WebDriver may be your only option for downloading this data from the CDC. This can be difficult and prone to breaking, so might I suggest a different source for the data: the NY Times has made their data available in a GitHub repository.

Kryten
  • 15,230
  • 6
  • 45
  • 68
  • Hey Kryten, I do pull a couple of other csv's from the same NYT GitHub repo, but this data isn't available there and hence I was pulling it from this website directly. Thanks for the answer. – jaideep_kashyap Oct 06 '20 at 22:46
0

In this case I would recommend first finding the API for fetching the data. With a quick network inspection I found that the data is fetched from https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=ed_trend_data. You can view all outgoing requests in the network tab in the developer console.

You can get the JSON data by sending a GET requests to the URL, then turn the JSON into a CSV.

import requests, csv

# Fetch data
data = requests.get("https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=ed_trend_data").json()["ed_trend_data"]

with open("data.csv", "w") as file:

    # Open CSV writer
    csv_file = csv.writer(file, lineterminator='\n')
    
    # Write heading
    csv_file.writerow([ "Geography", "Date", "Syndrome", "Percent" ])
    
    # Write data to CSV
    for item in data:
        csv_file.writerow([ item["Geography"], item["Date"], item["Indicator"], item["Percent"] ])
Drew Snow
  • 157
  • 2
  • 9
  • Hey Drew, Thanks for the answer. This was my initial try as well, but i was unable to find the URL to send a GET request in the Network pane of the console. This url works perfect !! – jaideep_kashyap Oct 06 '20 at 22:41