0

I am novice in python (c++ developer), I am trying to do some hands-on on web scraping on windows IE.

The problem which I am facing is, when I open a URL using "requests" library the server sends me a login page always. I figured out the problem. Its actually doing it because it presumes you are coming through IE tries to execute on function which uses some information from the SSO ( single signup object ) which is there executing on the background in Windows on the first login to the web server ( consider this as some weird setup.)

On observing this I changed my strategy & started using webbrowser lib. Now, when I try to do a webbrowser.open("url"), the browser is opening the page properly which is great thing!!!

But, my problems now are :

1) I do not want that the browser page opened should be visible to the user ( some way that the browser is opened in background ). I tried to used this :

ie = webbrowser.BackgroundBrowser(webbrowser.iexplore)
ie.Visible = 0
ie.open('url')

but no success. It opens the page which is visible to the user.

2) [This is main activity] I want to scrape the page which is opened in the web browser's IE page opened above. how to do? I tried to dig into this link but did not find any APIs for getting the data.

Kindly help.

PS : I tried to use beautiful soup for scraping on some other web pages using requests. It was successful & I go the data I wanted. But not in this case.

AnotherDeveloper
  • 2,161
  • 2
  • 23
  • 27
  • webbrowser is limited to just opening pages - it can't scrape that i'm aware. Selenium is what you're looking for to drive a web browser (eg. IE) and then scrape information from the resultant page. To get the selenium invisible is tricky, but you can move its window off page. – Sweet Burlap Jun 17 '16 at 08:31

1 Answers1

1

The webbrowser module doesn't allow to do that. The get function you mentioned is to retrieve registered web browsers not to scrap a HTTP GET request.

I don't know what is triggering the behavior you described with IE, have you tried to change your User-Agent with IE ones? You can check this post for more details: Sending "User-agent" using Requests library in Python

Community
  • 1
  • 1
payet_s
  • 71
  • 3