1

Honestly, I just want to save a web page as a text file. So what I do:

fp=webdriver.FirefoxProfile()
browser = webdriver.Firefox(firefox_profile=fp)
browser.get('http://www.google.com')

saveas = ActionChains(browser).key_down(Keys.CONTROL)\
         .send_keys('s').key_up(Keys.CONTROL)
saveas.perform()

But it goes directly to Save As window of Firefox and all I need is to press Enter button to save the page. How can I do this?

I don't want use another library for pressing keys (although, if there is no alternative I may consider it). Also, if there is an easier way to save a page as a text file with Selenium I would adopt it.

udondan
  • 57,263
  • 20
  • 190
  • 175
Sergey Ivanov
  • 3,719
  • 7
  • 34
  • 59

3 Answers3

1

You don't need to manually invoke "Save" dialog in this case. Just get the complete page source code from the .page_source property:

browser.page_source

To save it into a file:

with open('output.html', 'w') as f:
    f.write(browser.page_source.encode('utf-8'))

The reason why you may have had difficulties sending Enter key to this "Save (as)" dialog, is that it is not a javascript popup - selenium cannot control it. In cases like that, usually we try to prevent the popup from opening and download files automatically by tweaking the firefox preferences, see:

But, since this is a "complete web page" needs to be downloaded - there is no specific mime-type to configure (if we are speaking about Firefox).


And, yes, think about what @MattDMo is pointing out - you might not need selenium here if there is no dynamic nature involved in forming the page.

EDIT:

Getting the page text and saving it:

with open('/Path/to/my/file/output.txt', 'w') as f:
    f.write(driver.find_element_by_tag_name('body').text.encode('utf-8'))
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • I got a UnicodeError when I'm trying to save it to the file... A webpage is dynamic, and I want to download it as a Text file, not as a Complete Web Page. – Sergey Ivanov Jan 19 '15 at 06:09
  • @SergeyIvanov got it, updated the code in the answer, works for me. Check it out. – alecxe Jan 19 '15 at 11:16
  • It doesn't save a page as a Text file, it probably downloads it as an html file. Besides, it saves to default directory, when I want to specify directory and keep the default name of the file (not encode it). – Sergey Ivanov Jan 20 '15 at 06:59
  • @SergeyIvanov not sure what you are missing here, just change `output.html` to be `/Path/to/my/directory/myfile.txt` and that's it. – alecxe Jan 20 '15 at 12:47
  • No. Formats when you download from Firefox using "Web Page, HTML only" and "Text file" are different. I need the latter. Plus I need a default name of the file, not encoded one. – Sergey Ivanov Jan 21 '15 at 03:08
  • @SergeyIvanov Check the EDIT - is it getting closer to what you are asking about? Thanks. – alecxe Jan 21 '15 at 03:14
  • Interesting... But I think the answer is no. If you save the page manually with "Web Page, HTML only" and "Text file" and search for body tag in the former html file, then it looks drastically different from the latter text file. – Sergey Ivanov Jan 21 '15 at 03:34
0

Simply saving the contents of a webpage doesn't require selenium:

import requests

url = 'https://www.google.com'
r = requests.get(url)
with open('google.html', 'w') as fh:
    fh.write(r.text)

urllib, urllib2, or similar could also be used if you prefer, I find requests to be the easiest and most straightforward.

MattDMo
  • 100,794
  • 21
  • 241
  • 231
  • Yes, but what about the dynamic part of it - what if a page is constructed with js and with the help of additional AJAX requests? – alecxe Jan 19 '15 at 05:09
  • @alecxe true, `requests` does not handle dynamic content like AJAX. However, from the OP's question - "*Honestly, I just want to save a web page as a text file*" - I assumed this would be enough. – MattDMo Jan 19 '15 at 05:11
  • No, I actually want to download a page with AJAX. – Sergey Ivanov Jan 19 '15 at 05:54
  • 1
    @SergeyIvanov OK, then you need to specifically state that in your question. – MattDMo Jan 19 '15 at 05:54
-1

.send_keys(Keys.RETURN) should do it for you.

James Lemieux
  • 720
  • 1
  • 9
  • 26
  • Won't work - this is not a javascript popup, selenium cannot interact with it. – alecxe Jan 19 '15 at 05:06
  • 1
    Ah yes of course. I read too quickly and only saw he was asking how to push 'enter'. `browser.page_source` would definitely be the solution here. – James Lemieux Jan 19 '15 at 05:22