3

The below piece of code clicks the file menu on a page which contain excel worksheet.

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get(r"foo%20Data%20235.xlsx&DefaultItemOpen=3") # dummy link
driver.find_element_by_css_selector('#jewel-button-middle > span').click() # responsible for clicking the file menu
driver.quit()

And I don't know how to click the first option ie, Download a snapshot option from the popup menu. I can't able to inspect the elements of pop up or dropdown menu. I want the xlsx file to get downloaded.

enter image description here

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • 1
    Could you share the actual link if possible? Or, could you dump the `driver.page_source` once you open this page on the screenshot and add the relevant part of the HTML (the menu with the "Download a Snapshot" link) to the question? Thanks. – alecxe Jan 11 '17 at 04:36
  • @alecxe Hi.. here is the link http://www.cbe.org.eg/en/EconomicResearch/Publications/_layouts/xlviewer.aspx?id=/MonthlyStatisticaclBulletinDL/External%20Sector%20Data%20235.xlsx&DefaultItemOpen=1# – Avinash Raj Jan 18 '17 at 08:42

4 Answers4

2

It easier to inspect such elements (closing dropdowns) using FireFox, open the developer tools and just stand on the element with the mouse cruiser after selecting the option from FireBug toolbar (marked in red square in the picture).

enter image description here

As for the question, the locator you are looking for is ('[id*="DownloadSnapshot"] > span')

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get(r"foo%20Data%20235.xlsx&DefaultItemOpen=3") # dummy link

wait = WebDriverWait(driver, 10)

wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, '[id*="loadingTitleText"]')))

driver.find_element_by_css_selector('#jewel-button-middle > span').click() # responsible for clicking the file menu

download = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '[id*="DownloadSnapshot"] > span')))
driver.get_screenshot_as_file('fileName')
download.click()
Guy
  • 46,488
  • 10
  • 44
  • 88
  • Note that it takes 1-2 seconds for the download to start, you might want to delay the `driver.quit()` – Guy Jan 19 '17 at 07:35
  • I tried to take screenshot once the driver clicked the file menu. BUt it fails to show the dropdown on the screenshot. – Avinash Raj Jan 19 '17 at 08:08
  • so it should wait until the element got visible within 10 sec, right? – Avinash Raj Jan 19 '17 at 08:44
  • @AvinashRaj Exactly. You can read [here](http://selenium-python.readthedocs.io/waits.html) more about waits. – Guy Jan 19 '17 at 08:46
  • @AvinashRaj I edited my answer, you need to wait for the page to load before clicking on the `file` tab – Guy Jan 19 '17 at 09:15
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/133529/discussion-between-avinash-raj-and-guy). – Avinash Raj Jan 19 '17 at 09:35
2

The idea is to load the page with PhantomJS, wait for the contents of the workbook to load, get all the necessary parameters for the download file handler endpoint request which we can do with requests package.

Full working solution:

import json

import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

WORKBOOK_TYPE = "PublishedItemsSnapshot"

driver = webdriver.PhantomJS()
driver.maximize_window()
driver.get('http://www.cbe.org.eg/en/EconomicResearch/Publications/_layouts/xlviewer.aspx?id=/MonthlyStatisticaclBulletinDL/External%20Sector%20Data%20235.xlsx&DefaultItemOpen=1#')

wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.ID, "ctl00_PlaceHolderMain_m_excelWebRenderer_ewaCtl_rowHeadersDiv")))

# get workbook uri
hidden_input = wait.until(EC.presence_of_element_located((By.ID, "ctl00_PlaceHolderMain_m_excelWebRenderer_ewaCtl_m_workbookContextJson")))
workbook_uri = json.loads(hidden_input.get_attribute('value'))['EncryptedWorkbookUri']

# get session id
session_id = driver.find_element_by_id("ctl00_PlaceHolderMain_m_excelWebRenderer_ewaCtl_m_workbookId").get_attribute("value")

# get workbook filename
workbook_filename = driver.find_element_by_xpath("//h2[contains(@class, 's4-mini-header')]/span[contains(., '.xlsx')]").text

driver.close()

print("Downloading workbook '%s'..." % workbook_filename)
response = requests.get("http://www.cbe.org.eg/en/EconomicResearch/Publications/_layouts/XlFileHandler.aspx", params={
    'id': workbook_uri,
    'sessionId': session_id,
    'workbookFileName': workbook_filename,
    'workbookType': WORKBOOK_TYPE
})
with open(workbook_filename, 'wb') as f:
    for chunk in response.iter_content(chunk_size=1024):
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Hi @alecxe. Thanks for your answer. Is there any way to use only phantomjs for upto the last step, ie. on clicking the download snapshot button using phantomjs, it has to get the download url and hand over the url to requests library? – Avinash Raj Jan 25 '17 at 06:08
  • @AvinashRaj glad it worked for you. I believe that PhantomJS is not able to save downloads automatically. The download link is dynamically constructed, not sure if there is an easy way to get it instead of constructing it manually as we are currently doing.. – alecxe Jan 25 '17 at 17:21
1

I observed the till the excel is completely loaded, File menu is not showing any options. So added wait till the excel book is loaded.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep
from selenium.webdriver.common.action_chains import ActionChains

browser = webdriver.PhantomJS()
browser.maximize_window()
browser.get('http://www.cbe.org.eg/en/EconomicResearch/Publications/_layouts/xlviewer.aspx?id=/MonthlyStatisticaclBulletinDL/External%20Sector%20Data%20235.xlsx&DefaultItemOpen=1#')

wait = WebDriverWait(browser, 10)
element = wait.until(EC.visibility_of_element_located((By.XPATH, "//td[@data-range='B59']")))
element = wait.until(EC.element_to_be_clickable((By.ID, 'jewel-button-middle')))
element.click()
eleDownload = wait.until(EC.element_to_be_clickable((By.XPATH,"//span[text()='Download a Snapshot']")))
eleDownload.click()
sleep(5)
browser.quit()
Naveen Kumar R B
  • 6,248
  • 5
  • 32
  • 65
  • increase timeout from `10 to 20 seconds` in `WebDriverWait` if your internet speed is slow. I observed that it takes time for loading the excel. – Naveen Kumar R B Jan 19 '17 at 10:33
  • oki, cool.. But I want this to get done through webdriver.PhantomJS – Avinash Raj Jan 19 '17 at 10:35
  • just replace the line `webdriver.Firefox()` to `webdriver.PhantomJS()` rest is same. – Naveen Kumar R B Jan 19 '17 at 10:36
  • your code seems like working on using firefox web driver but on phantom, it throws exception on line 13 .. How to get the file downloaded? Note that I'm using phantomjs .. – Avinash Raj Jan 19 '17 at 10:46
  • 1
    use the following answer http://stackoverflow.com/a/18440478/2575259 to save the file automatically in Firefox browser. In PhantomJS, it seems to be an issue. more details https://github.com/ariya/phantomjs/issues/10052 & http://stackoverflow.com/questions/25755713/using-selenium-with-python-and-phantomjs-to-download-file-to-filesystem – Naveen Kumar R B Jan 19 '17 at 10:53
  • Thanks for your answer. – Avinash Raj Jan 25 '17 at 06:09
0

find the element by id/tag, inspect options in a loop, select the one you want then do the click.

F.Moure
  • 61
  • 1
  • 4