0

I want to get all GET and POST parameters from Web Page. Let's say there is some web page. I can get all links from this page. But if this page takes input parameters (GET and POST) how can I get them? My algorithm is like this:

find in web page this type of strings <form method="GET">...</form>;
then for each found result:
     get <input> fields and construct request
     then save it somewhere

My purpose is to write crawler which gets all links, GET and POST parameters from web site and then save it somewhere for further analysis. My algorithm is simple, so I want to know is there any other way (in python)? Can you recommend any python libraries?

torayeff
  • 9,296
  • 19
  • 69
  • 103
  • Plz, describe what you want to do. – Denis May 16 '12 at 09:02
  • I want to write crawler for web application vulnerability scanner, so this crawler must get all links, GET and POST parameters from this web page and store them to analyze for vulnerability scanner – torayeff May 16 '12 at 09:06
  • Simply, you need all link and forms on page? If its true you can try to use ButifulSoup or lxml and I prefer last. – Denis May 16 '12 at 09:08
  • that was also what I wanted to do, but I wanted to know is there another way of doing so – torayeff May 16 '12 at 09:10

1 Answers1

0

How about something like this to get you started? It pulls out forms and input attributes:

from BeautifulSoup import BeautifulSoup

s = urllib2.urlopen('http://stackoverflow.com/questions/10614974/how-to-get-post-and-get-parameters-from-web-page-in-python').read()
soup = BeautifulSoup(s)

forms = soup.findall('form')
for form in forms:
  print 'form action: %s (%s)' % (form['action'], form['method'])
  inputs = form.findAll('input')
  for input in inputs:
    print "  -> %s" % (input.attrs) 

Output (for this page):

form action: /search (get)
  -> [(u'autocomplete', u'off'), (u'name', u'q'), (u'class', u'textbox'), (u'placeholder', u'search'), (u'tabindex', u'1'), (u'type', u'text'), (u'maxlength', u'140'), (u'size', u'28'), (u'value', u'')]
form action: /questions/10614974/answer/submit (post)
  -> [(u'id', u'fkey'), (u'name', u'fkey'), (u'type', u'hidden'), (u'value', u'923d3d8b45bbca57cbf0b126b2eb9342')]
  -> [(u'id', u'author'), (u'name', u'author'), (u'type', u'text')]
  -> [(u'id', u'display-name'), (u'name', u'display-name'), (u'type', u'text'), (u'size', u'30'), (u'maxlength', u'30'), (u'value', u''), (u'tabindex', u'105')]
  -> [(u'id', u'm-address'), (u'name', u'm-address'), (u'type', u'text'), (u'size', u'40'), (u'maxlength', u'100'), (u'value', u''), (u'tabindex', u'106')]
  -> [(u'id', u'home-page'), (u'name', u'home-page'), (u'type', u'text'), (u'size', u'40'), (u'maxlength', u'200'), (u'value', u''), (u'tabindex', u'107')]
  -> [(u'id', u'submit-button'), (u'type', u'submit'), (u'value', u'Post Your Answer'), (u'tabindex', u'110')]
Maria Zverina
  • 10,863
  • 3
  • 44
  • 61
  • If you haven't install BeautifulSoup already you can do so by using "pip install BeautifulSoup" – Maria Zverina May 16 '12 at 09:29
  • for the http://www.ertir.com it gives this error:Traceback (most recent call last): File "akaParser.py", line 86, in getForms(parsedPage) File "akaParser.py", line 75, in getForms method = form['method'] File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 881, in __getitem__ return self.attrs[key] KeyError: 'method' – torayeff May 16 '12 at 22:47
  • The form doesn't specify a "method". Check the .attrs field for present of 'method' rather than assuming it exists. – Maria Zverina May 17 '12 at 09:01
  • if form does not have method attrs, which method it will be GET or POST? for example, on the given page: – torayeff May 17 '12 at 09:43
  • GET - http://stackoverflow.com/questions/2314401/what-is-the-default-form-posting-method – Maria Zverina May 17 '12 at 09:55