How to automate browsing using python?

Question

suppose, I need to perform a set of procedure on a particular website say, fill some forms, click submit button, send the data back to server, receive the response, again do something based on the response and send the data back to the server of the website. I know there is a webbrowser module in python, but I want to do this without invoking any web browser. It hast to be a pure script.

Is there a module available in python, which can help me do that?
thanks

Duplicate: http://stackoverflow.com/search?q=%5Bpython%5D+scraping. Every question on screen scraping answers this question. Specifically: http://stackoverflow.com/questions/419260/grabbing-text-from-a-webpage — S.Lott, Aug 18 '09 at 10:23
Selenium is the only full solution to this as far as I can tell and I have looked at every option for this sort of thing I Can find.. if you just need to grab web pages then mechanize will do fine or do basic form entry, but for real browser emulation it seems you need selenium — Rick, Aug 25 '10 at 22:45

score 19 · Answer 1 · answered Aug 15 '10 at 10:19

19

selenium will do exactly what you want and it handles javascript

answered Aug 15 '10 at 10:19

adaptive

263
3
7

1

Although I don't think this can be done headless, which is what is often implied by "pure script", this will as closely as possible emulate a real browser experience...since it's using a real browser. Most sites today are completely broken without Javascript, which makes mechanize obsolete. – Cerin Oct 15 '10 at 13:34
1

this is wrong..you can easily fake a browser using Pyvirtual display to run python with selenium in a headless mode.. – Amistad Jun 05 '14 at 21:06
There is [http://www.seleniumhq.org/docs/03_webdriver.jsp#htmlunit-driver](http://www.seleniumhq.org/docs/03_webdriver.jsp#htmlunit-driver). Also see, [https://github.com/detro/ghostdriver](https://github.com/detro/ghostdriver). Both these are for headles javascript. First one is official and second one third party. – Ajeeb.K.P Apr 17 '15 at 10:11

score 18 · Accepted Answer · edited Oct 15 '19 at 01:10

18

You can also take a look at mechanize. Its meant to handle "stateful programmatic web browsing" (as per their site).

edited Oct 15 '19 at 01:10

Julien

5,243
4
34
35

answered Aug 18 '09 at 09:43

arcanum

697
5
9

1

mechanize, in my experience, is pretty slow, but once https, cookies, logins, are involved, it's *much* easier than urllib2. – Gregg Lind Aug 18 '09 at 14:52
1

selenium provides a lot more than mechanize but mechanize is good for just basic stuff but will cause issues if you are trying to do real browser emulation as it doesn't do things like auto download images, css files, etc and seems to always be detectable by the strictest sites as being an automated tool – Rick Aug 25 '10 at 22:48
2

Unfortunately, mechanize is not maintained anymore, and does not support Python 3. – Matthieu Moy Aug 22 '16 at 12:15
As of March 2017, maintenance has been taken over by someone else and it does indeed support Python 3: https://github.com/python-mechanize/mechanize – Julien Oct 15 '19 at 00:28

score 8 · Answer 3 · edited Aug 04 '14 at 10:03

All answers are old, I recommend and I am a big fan of requests

From homepage:

Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken. It was built for a different time — and a different web. It requires an enormous amount of work (even method overrides) to perform the simplest of tasks.

Things shouldn't be this way. Not in Python.

score 8 · Answer 4 · edited Nov 06 '13 at 20:53

8

I think the best solutions is the mix of requests and BeautifulSoup, I just wanted to update the question so it can be kept updated.

edited Nov 06 '13 at 20:53

mit

11,083
11
50
74

answered Oct 29 '13 at 18:55

Leonardo

2,484
2
25
37

score 3 · Answer 5 · answered Nov 27 '13 at 14:36

3

Selenium http://www.seleniumhq.org/ is the best solution for me. you can code it with python, java, or anything programming language you like with ease. and easy simulation that convert into program.

answered Nov 27 '13 at 14:36

Yuda Prawira

12,075
10
46
54

score 2 · Answer 6 · answered Aug 18 '09 at 09:38

There are plenty of built in python modules that whould help with this. For example urllib and htmllib.

The problem will be simpler if you change the way you're approaching it. You say you want to "fill some forms, click submit button, send the data back to server, recieve the response", which sounds like a four stage process.

In fact, what you need to do is post some data to a webserver and get a response.

This is as simple as:

>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
>>> print f.read()

(example taken from the urllib docs).

What you do with the response depends on how complex the HTML is and what you want to do with it. You might get away with parsing it using a regular expression or two, or you can use the htmllib.HTMLParser class, or maybe a higher level more flexible parser like Beautiful Soup.

score 2 · Answer 7 · answered Jan 15 '14 at 06:46

2

Do not forget zope.testbrowser which is wrapper around mechanize .

zope.testbrowser provides an easy-to-use programmable web browser with special focus on testing.

answered Jan 15 '14 at 06:46

JamesThomasMoon

6,169
7
37
63

score 2 · Answer 8 · answered Jun 09 '11 at 15:14

2

Selenium2 includes webdriver, which has python bindings and allows one to use the headless htmlUnit driver, or switch to firefox or chrome for graphical debugging.

answered Jun 09 '11 at 15:14

Nathan

4,545
6
32
49

score 1 · Answer 9 · answered Dec 19 '13 at 10:48

1

The best solution that i have found (and currently implementing) is : - scripts in python using selenium webdriver - PhantomJS headless browser (if firefox is used you will have a GUI and will be slower)

answered Dec 19 '13 at 10:48

Kostas Demiris

3,415
8
47
85

twasbrillig · Answer 10 · 2015-02-10T20:03:57.697

1

I have found the iMacros Firefox plugin (which is free) to work very well.

It can be automated with Python using Windows COM object interfaces. Here's some example code from http://wiki.imacros.net/Python. It requires Python Windows Extensions:

import win32com.client
def Hello():
    w=win32com.client.Dispatch("imacros")
    w.iimInit("", 1)
    w.iimPlay("Demo\\FillForm")
if __name__=='__main__':
    Hello()

edited Feb 10 '15 at 20:03

answered Feb 10 '15 at 19:48

twasbrillig

17,084
9
43
67

Does this only work on windows machines? – MikeiLL Feb 19 '16 at 04:08
1

Yes, as far as I know, anything using win32 libraries only works on Windows. – twasbrillig Feb 23 '16 at 22:37

score 0 · Answer 11 · answered Aug 18 '09 at 09:31

0

You likely want urllib2. It can handle things like HTTPS, cookies, and authentication. You will probably also want BeautifulSoup to help parse the HTML pages.

answered Aug 18 '09 at 09:31

Steven Huwig

20,015
9
55
79

score 0 · Answer 12 · answered Aug 18 '09 at 09:44

You may have a look at these slides from the last italian pycon (pdf): The author listed most of the library for doing scraping and autoted browsing in python. so you may have a look at it.

I like very much twill (which has already been suggested), which has been developed by one of the authors of nose and it is specifically aimed at testing web sites.

score 0 · Answer 13 · answered Aug 18 '09 at 10:36

0

Internet Explorer specific, but rather good:

http://pamie.sourceforge.net/

The advantage compared to urllib/BeautifulSoup is that it executes Javascript as well since it uses IE.

answered Aug 18 '09 at 10:36

fraca7

1,178
5
11

score 0 · Answer 14 · answered Oct 21 '10 at 14:45

0

httplib2 + beautifulsoup

Use firefox + firebug + httpreplay to see what the javascript passes to and from the browser from the website. Using httplib2 you can essentially do the same via post and get

answered Oct 21 '10 at 14:45

score 0 · Answer 15 · answered Jun 30 '18 at 07:21

For automation you definitely might wanna check out

webbot

Its is based on selenium and offers lot more features with very little code like automatically finding elements to perform actions like click , type based on the your parameters.

Its even works for sites with dynamically changing class names and ids .

Here is doc : https://webbot.readthedocs.io/

How to automate browsing using python?

15 Answers15

Linked