20

I am trying to use Selenium in Python to save webpages on MacOS Firefox.

So far, I have managed to click COMMAND + S to pop up the SAVE AS window. However,

I don't know how to:

  1. change the directory of the file,
  2. change the name of the file, and
  3. click the SAVE AS button.

Could someone help?

Below is the code I have use to click COMMAND + S:

ActionChains(browser).key_down(Keys.COMMAND).send_keys("s").key_up(Keys.COMMAND).perform()

Besides, the reason for me to use this method is that I encounter Unicode Encode Error when I :-

  1. write the page_source to a html file and
  2. store scrapped information to a csv file.

Write to a html file:

file_object = open(completeName, "w")
html = browser.page_source
file_object.write(html)
file_object.close() 

Write to a csv file:

csv_file_write.writerow(to_write)

Error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128)

Saurabh Gaur
  • 23,507
  • 10
  • 54
  • 73
Tommy N
  • 365
  • 1
  • 4
  • 12
  • I end up not using the `SAVE AS` method and to solve the html-file and csv-file writing problems, I used codecs and unicodecsv. Refer to RemcoW's comment and this post http://stackoverflow.com/questions/18766955/how-to-write-utf-8-in-a-csv-file for details. – Tommy N Jun 15 '16 at 13:40

4 Answers4

27
with open('page.html', 'w') as f:
    f.write(driver.page_source)
misantroop
  • 2,276
  • 1
  • 16
  • 24
  • Note that `driver.page_source` can crash with pages larger than 200MB in most webdrivers. For huge pages, [using ActionChains](https://stackoverflow.com/questions/10967408/save-a-web-page-with-python-selenium) is more reliable. – Carlos Roldán May 07 '19 at 00:27
  • On Python 2 with unicode in the page source you might need: `driver.page_source.encode('utf-8')`. – mgalgs May 30 '19 at 20:33
9

What you are trying to achieve is impossible to do with Selenium. The dialog that opens is not something Selenium can interact with.

The closes thing you could do is collect the page_source which gives you the entire HTML of a single page and save this to a file.

import codecs

completeName = os.path.join(save_path, file_name)
file_object = codecs.open(completeName, "w", "utf-8")
html = browser.page_source
file_object.write(html)

If you really need to save the entire website you should look into using a tool like AutoIT. This will make it possible to interact with the save dialog.

RemcoW
  • 4,196
  • 1
  • 22
  • 37
  • Thank you! I am aware of this method. However, for my webpages contain characters that prompt Unicode Encode Errors, I need to save the webpages in its original format to avoid loosing important information. An example of the Unicode Encode Errors is ... 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128). – Tommy N Jun 15 '16 at 12:56
  • @TommyN When are you getting this error? When trying to write the page_source to the file? – RemcoW Jun 15 '16 at 12:58
  • Yes, it happens when I try to write the page_source to a html file. Would you know if there are any solutions for me to minimize the amount of information lost in regards to those special characters? (I intentionally don't want to use ignore) – Tommy N Jun 15 '16 at 13:02
  • @RemcoW Would you think that I can use codecs for writing to a csv file as well? – Tommy N Jun 15 '16 at 13:27
  • 1
    @TommyN Take a look at this question for that: http://stackoverflow.com/questions/18766955/how-to-write-utf-8-in-a-csv-file – RemcoW Jun 15 '16 at 13:28
  • [What is the equivalent tool for autoit in ubuntu?](https://askubuntu.com/q/822075/10425) – Martin Thoma Sep 01 '17 at 12:13
5

You cannot interact with system dialogs like save file dialog. If you want to save the page html you can do something like this:

page = driver.page_source
file_ = open('page.html', 'w')
file_.write(page)
file_.close()
Mobrockers
  • 2,128
  • 1
  • 16
  • 28
  • 1
    Getting the HTML can also be accomplished by using `driver.page_source`. This spares the need for finding the html element an getting its outerHTML manually. – RemcoW Jun 15 '16 at 12:51
2

This is a complete, working example of the answer RemcoW provided:

You first have to install a webdriver, e.g. pip install selenium chromedriver_installer.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# core modules
import codecs
import os

# 3rd party modules
from selenium import webdriver


def get_browser():
    """Get the browser (a "driver")."""
    # find the path with 'which chromedriver'
    path_to_chromedriver = ('/usr/local/bin/chromedriver')
    browser = webdriver.Chrome(executable_path=path_to_chromedriver)
    return browser


save_path = os.path.expanduser('~')
file_name = 'index.html'
browser = get_browser()

url = "https://martin-thoma.com/"
browser.get(url)

complete_name = os.path.join(save_path, file_name)
file_object = codecs.open(complete_name, "w", "utf-8")
html = browser.page_source
file_object.write(html)
browser.close()
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958