How do I get Python to click on a button, download the Excel file and save it into a designated folder?

Question

For one of my personal projects, I'm trying to "web scrape" some financial data and I would like to put it into a windows task scheduler that runs daily.

This is my current code:

import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
import selenium.webdriver.support.ui as ui
from selenium.webdriver.support.ui import WebDriverWait
import selenium.webdriver.support.expected_conditions as EC
from bs4 import BeautifulSoup


options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')


mainurl = "https://apa.nexregreporting.com/home/portfoliocompression"

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.x Safari/53x'}
page = requests.get(mainurl, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')

When I use this code, it gives me a ConnectionError:

HTTPSConnectionPool error, Max retries exeeded with url:

How do I get Python to click the blue search button and save the Excel file into a designated folder? I noticed that the HTML object for the blue search button isn't normal either.

The website is https://apa.nexregreporting.com/home/portfoliocompression

Possible duplicate of [Max retries exceeded with URL in requests](https://stackoverflow.com/questions/23013220/max-retries-exceeded-with-url-in-requests) — AMC, Nov 21 '19 at 06:18
Your question seems overly broad, it might be better to focus on fixing that error first. — AMC, Nov 21 '19 at 06:19
Why are you using selenium, requests and beautifulsoup when all does same task? And also you have never initiated webdriver. And for the error you need to go through a proxy if you are in one or you can use time.sleep for waiting. you can also use verify=False as parameter in requests.get . — Harish Vutukuri, Nov 21 '19 at 07:24
@datacookies, you edited the question stating that data is confidential. Please note that question versions are kept, so the original text is still available for anyone. If it is really necessary to remove that data from the site, you have two options: [check this post](https://meta.stackexchange.com/questions/25088/how-can-i-delete-my-question-on-stack-overflow) to remove the question yourself. The site might refuse to do so. If that's the case, you can try to flag your question for moderator intervention, and explain your request. — caxcaxcoatl, Nov 22 '19 at 00:42

score 2 · Answer 1 · answered Nov 21 '19 at 07:41

2

This is the code to open the chrome using selenium and downloading the file by clicking on the button.

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# Options for Chrome WebDriver
op = Options()
op.add_argument('--disable-notifications')
op.add_experimental_option("prefs",{
    "download.prompt_for_download": False,
    "download.directory_upgrade": True,
    "safebrowsing.enabled": True 
})

# Download Path
download_dir = 'D:\\'

# Initializing the Chrome webdriver with the options
driver = webdriver.Chrome(options=op)

# Setting Chrome to trust downloads
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
command_result = driver.execute("send_command", params)

driver.implicitly_wait(5)

# Opening the page
driver.get("https://apa.nexregreporting.com/home/portfoliocompression")

# Click on the button and wait for 10 seconds
driver.find_element_by_xpath('//*[@class="btn btn-default"]').click()
time.sleep(10)

# Closing the webdriver
driver.close()

answered Nov 21 '19 at 07:41

Harish Vutukuri

1,092
6
14

I've been busy finding a way to click since i was using `find_elements_by_css_selector('btn.btn-default')` and were trying to access the index of the list to click on it. you did the job very well using `xpath` – αԋɱҽԃ αмєяιcαη Nov 21 '19 at 07:47
You can use CSS selector as well `driver.find_elements_by_css_selector('input.btn.btn-default').click()` – Harish Vutukuri Nov 21 '19 at 07:51
let's work on that case. interesting https://stackoverflow.com/questions/58949031/using-selenium-to-solve-captcha – αԋɱҽԃ αмєяιcαη Nov 21 '19 at 07:52
@αԋɱҽԃαмєяιcαη I've answered on the question mentioned. Hope it works. – Harish Vutukuri Nov 21 '19 at 08:38
you have a response by me :) – αԋɱҽԃ αмєяιcαη Nov 21 '19 at 08:49
How do you know what options to use? – datacookies Nov 21 '19 at 23:16
@datacookies you can use either xpath or selector or both, choose which one you're comfortable with. – Harish Vutukuri Nov 22 '19 at 08:54

How do I get Python to click on a button, download the Excel file and save it into a designated folder?

1 Answers1