2

I am fairly new to web scraping. I am trying to write something in python with selenium that will automatically log on to a website and click multiple options from a drop down menu. When all those options have been set, a button will be clicked and then a new page pops up with multiple hrefs. This is where I am running into problems. I am trying to click all the hrefs, but all the hrefs have this structure

<a href="WebsiteName.asp?qt=1&amp;qa=0&amp;ben=1&amp;tpt=0&amp;cl=Something&amp;gl=1&amp;life=1&amp;smo=1">Export</a>  

Where only 'life=1' and 'smo=1' may change to something else in the above HTML.

Most other problems that I have encountered here, tend to have hrefs with a class or something of the like that makes clicking these links more convenient.

The code below is what I have so far.

import selenium,time
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup, SoupStrainer
import requests

#credentials
usernameStr = 'SomeUsername'
passwordStr = 'SomePassword'

browser = webdriver.Firefox(executable_path = r'C:\Users\Name\Downloads\geckodriver-v0.24.0-win64\geckodriver.exe')
url = 'http://somewebsite.com/something/'
browser.get(url)

username = browser.find_element_by_id('username')
username.send_keys(usernameStr)

password = browser.find_element_by_id('password')
password.send_keys(passwordStr)

loginInButton = browser.find_element_by_id("login")
loginInButton.click()

browser.find_element_by_xpath("//*[@id='LifeType']").send_keys("Dual")
browser.find_element_by_id("btnRefresh").click()
browser.find_element_by_id("btnExport").click()
 
other_url = 'http://somewebsite.com/something/exportToExcelChoice.asp?qt=1&qa=0&ben=1&tpt=0&gl=1&cl=CAESFFHIILNI'

below is the where I encounter the problems

page = requests.get(other_url)    
data = page.text
soup = BeautifulSoup(data, features="html.parser")

for link in soup.find_all('a'):
    link.get('href')
    browser.find_element_by_link_text("Export").click()

With Beautiful Soup I can easily print out the required links, but I am note sure if it is even necessary since I cannot click the links. I am still trying to work this one out.

PS I know this isn't strictly web scraping since all that I am doing is clicking buttons with the ultimate goal of putting everything into a csv file.

HTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head id="Head1">
    <title>
        Quote
    </title>
    <link href="StyleSheet.css" rel="stylesheet" type="text/css" />
    <link rel="StyleSheet" type="text/css" href="/include/arikibo.css" />
    <STYLE type="text/css">
        td {
            font-size: 14px
        }
    </STYLE>

</head>

<body>
    <span STYLE="font-family: Arial, Helvetica, Sans Serif; font-size:20px">

        <table cellpadding="3" cellspacing="0" border="0" >
            <tr>
                <td colspan="5">Please select the type of csv file you wish to generate<br><br>
                <b>Please be patient as this may take a few moments!</b><br><br></td>
            </tr>
            <tr>
                <td>Male s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=1&smo=1">Export</a></td>
                <td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td>
                <td>Male s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=1&smo=2">Export</a></td>
            </tr>
            <tr>
                <td>Female Non-s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=2&smo=1">Export</a></td>
                <td>&nbsp;</td>
                <td>Female s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=2&smo=2">Export</a></td>
            </tr>           
            <tr>
                <td>Joint Non-s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=3&smo=1">Export</a></td>
                <td>&nbsp;</td>
                <td>Joint s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=3&smo=2">Export</a></td>
            </tr>           
            <tr>
                <td>Dual Non-s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=4&smo=1">Export</a></td>
                <td>&nbsp;</td>
                <td>Dual s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=4&smo=2">Export</a></td>
            </tr>           
        </table>
    </span>
</body>

</html>
Community
  • 1
  • 1
user12321
  • 21
  • 3
  • What happen when click to links? As I understood export to excel happen – Sers Feb 23 '19 at 17:35
  • Sorry yes I should have made that clear. When I click the links, I will be exporting a csv file with some values to a specific directory. – user12321 Feb 23 '19 at 17:39

1 Answers1

0

As I understood popup is a new window and you have to switch to it :

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#...
other_url = 'http://somewebsite.com/something/exportToExcelChoice.asp?qt=1&qa=0&ben=1&tpt=0&gl=1&cl=CAESFFHIILNI'

wait = WebDriverWait(browser, 10)

#handles = driver.window_handles
browser.get(other_url)
#wait.until(EC.new_window_is_opened(handles))
#driver.switch_to.window(driver.window_handles[-1])

links = wait.until(EC.visibility_of_all_elements_located((By.TAG_NAME,"a")))
for link in links:
    link.click()
Sers
  • 12,047
  • 2
  • 12
  • 31