-1

When launching Firefox with selenium, how to get all the text that is displayed on that page to save it on a text file ?

  • 2
    What do you mean all text? In that case why do you want selenium? If you have ajax in it. Getting full source code is also difficult. Be more clear on what you are trying to achieve. Then people here can definitely help. – Yogeesh Seralathan Aug 01 '14 at 09:45
  • @YogeeshSeralathan There is only HTML text on the page I want to visit. No videos, no animation, no java applet or javascript. –  Aug 01 '14 at 09:48
  • 1
    you don't need selenium, use beautifulsoup – Padraic Cunningham Aug 01 '14 at 10:55
  • @ArtjomB. Yes, but there is a basic authentication to fulfill, so that is why I used selenium. Once authentication done, I can get access to the page I want and retrive its text. –  Aug 01 '14 at 12:17
  • @PadraicCunningham Thank you for the advice. Is BeautifulSoup able to launch a browser with a given URL as Selenium does ? I never heard about BeautifulSoup before you told me. –  Aug 01 '14 at 12:19
  • @new_programmar Why do you need selenium for getting authenticated? POST the username and password as parameters and handle the cookies and retrieve the source code of the page. **FYI:** selenium is usually used to handle dynamically loading html data and client side environment testing process. – Yogeesh Seralathan Aug 01 '14 at 15:46

3 Answers3

1

Give this a try

from selenium import webdriver as driver

browser = driver.Firefox()
browser.get("http://www.google.com")
print browser.find_element_by_xpath("html").text
Christoph Hegemann
  • 1,434
  • 8
  • 13
1

use httplib2 if html is the only thing you want to retrieve. As it states in the documentation, the most simple use is this:

import httplib2
h = httplib2.Http(".cache")
resp, content = h.request("http://example.org/", "GET")
chuse
  • 373
  • 3
  • 18
1

Selenium is quite the overkill for this sort of thing. You can use the built-in httplib in python, so you don't have any dependencies.

from httplib import HTTPConnection

conn = HTTPConnection("example.com")
conn.request("GET", "/") # the path or the complete url
print conn.getresponse().read()

If basic authentication is needed, then only the base64 encoded request headers need to be provided additionally.

This will of course not work, if a custom authentication is needed.

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222