Questions tagged [mechanize]

Mechanize is a library for automated web browsing originally developed for Perl, there are now also Python and Ruby implementations.

Mechanize is a Ruby library for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history. It is adapted from the Perl module. There is also a for Python.

2512 questions
178
votes
5 answers

adding directory to sys.path /PYTHONPATH

I am trying to import a module from a particular directory. The problem is that if I use sys.path.append(mod_directory) to append the path and then open the python interpreter, the directory mod_directory gets added to the end of the list sys.path.…
UnadulteratedImagination
  • 1,971
  • 2
  • 13
  • 15
156
votes
7 answers

How to avoid HTTP error 429 (Too Many Requests) python

I am trying to use Python to login to a website and gather information from several webpages and I get the following error: Traceback (most recent call last): File "extract_test.py", line 43, in response=br.open(v) File…
Aous1000
  • 2,052
  • 3
  • 16
  • 16
72
votes
8 answers

Which is best in Python: urllib2, PycURL or mechanize?

Ok so I need to download some web pages using Python and did a quick investigation of my options. Included with Python: urllib - seems to me that I should use urllib2 instead. urllib has no cookie support, HTTP/FTP/local files only (no SSL) urllib2…
bigredbob
  • 1,847
  • 4
  • 19
  • 19
55
votes
8 answers

Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"

Is there a way to get around the following? httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Is the only way around this to contact the site-owner (barnesandnoble.com).. i'm building a site that would bring them more sales,…
38
votes
8 answers

How to handle IncompleteRead: in python

I am trying to fetch some data from a website. However it returns me incomplete read. The data I am trying to get is a huge set of nested links. I did some research online and found that this might be due to a server error (A chunked transfer…
user1967046
34
votes
1 answer

How do I parse an HTML table with Nokogiri?

I installed Ruby and Mechanize. It seems to me that it is posible in Nokogiri to do what I want to do but I do not know how to do it. What about this table? It is just part of the HTML of a vBulletin forum site. I tried to keep the HTML structure…
Radek
  • 13,813
  • 52
  • 161
  • 255
33
votes
2 answers

Python Mechanize select a form with no name

I am attempting to have mechanize select a form from a page, but the form in question has no "name" attribute in the html. What should I do? when I try to use br.select_form(name = "") I get errors that no form is declared with that name, and the…
Mantas Vidutis
  • 16,376
  • 20
  • 76
  • 92
31
votes
4 answers

How to set custom user-agent for Mechanize in Rails

I know you have a set of pre-defined aliases you can use by setting agent.user_agent_alias = 'Linux Mozilla' for instance, but what if I want to set my own user agent, as I'm writing a web crawler and want to identify it, for the sites I'm…
Bashar Abdullah
  • 1,545
  • 1
  • 16
  • 27
29
votes
4 answers

Python mechanize - two buttons of type 'submit'

I have a mechanize script written in python that fills out a web form and is supposed to click on the 'create' button. But there's a problem, the form has two buttons. One for 'add attached file' and one for 'create'. Both are of type 'submit', and…
directedition
  • 11,145
  • 18
  • 58
  • 79
29
votes
4 answers

Scrape the absolute URL instead of a relative path in python

I'm trying to get all the href's from a HTML code and store it in a list for future processing such as this: Example URL: www.example-page-xl.com
Hello World
user7800892
28
votes
3 answers

How do I use Mechanize to process JavaScript?

I'm connecting to a web site, logging in. The website redirects me to new pages and Mechanize deals with all cookie and redirection jobs, but, I can't get the last page. I used Firebug and did same job again and saw that there are two more pages I…
user96960
  • 283
  • 1
  • 3
  • 4
28
votes
5 answers

Mechanize and Javascript

I want to use Mechanize to simulate browsing to a web page with active JavaScript, including DOM Events and AJAX, and so far I've found no way to do that. I looked at some Python client browsers that support JavaScript like Spynner and Zope, and…
Jeff Klip
  • 281
  • 1
  • 3
  • 3
26
votes
11 answers

How to install mechanize for Python 2.7?

I saved mechanize in my Python 2.7 directory. But when I type import mechanize into the Python shell, I get an error message that reads: Traceback (most recent call last): File "", line 1, in import mechanize ImportError:…
user601828
  • 499
  • 3
  • 7
  • 17
26
votes
9 answers

Is there a PHP equivalent of Perl's WWW::Mechanize?

I'm looking for a library that has functionality similar to Perl's WWW::Mechanize, but for PHP. Basically, it should allow me to submit HTTP GET and POST requests with a simple syntax, and then parse the resulting page and return in a simple format…
davr
  • 18,877
  • 17
  • 76
  • 99
25
votes
3 answers

Why does accessing a SSL site with Mechanize on Windows fail, but on Mac work?

This is the code I'm using to connect to the SSL site. require 'mechanize' a = Mechanize.new page = a.get 'https://site.com' I"m using using Ruby 1.9.3 and Mechanize 2.1pre1 + dependencies. On Mac the above code works and returns the page. On…
Kassym Dorsel
  • 4,773
  • 1
  • 25
  • 52
1
2 3
99 100