Extracting Fields Names of an HTML form - Python

Question

Assume that there is a link "http://www.someHTMLPageWithTwoForms.com" which is basically a HTML page having two forms (say Form 1 and Form 2). I have a code like this ...

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
h = httplib2.Http('.cache')
response, content = h.request('http://www.someHTMLPageWithTwoForms.com')
for field in BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')):
        if field.has_key('name'):
                print field['name']

This returns me all the field names that belong both to the Form 1 and Form 2 of my HTML page. Is there any way I can get only the Field names that belong to a particular form (say Form 2 only)?

Anas · Answer 1 · 2013-11-11T12:43:02.197

5

If it's only 2 forms you may try this one:

from BeautifulSoup import BeautifulSoup

forms = BeautifulSoup(content).findAll('form')
for field in forms[1]:
    if field.has_key('name'):
            print field['name']

If it's not only about the 2nd form you make it more specific (by an id or class attributs

from BeautifulSoup import BeautifulSoup

forms = BeautifulSoup(content).findAll(attrs={'id' : 'yourFormId'})
for field in forms[0]:
    if field.has_key('name'):
            print field['name']

edited Nov 11 '13 at 12:43

answered Aug 02 '11 at 11:15

Anas

1,761
1
13
22

I have tried this solution but got following error message: ": global name 'BeautifulSoup' is not defined" – Khokhar Nov 08 '13 at 14:09
Please make sure that BeautifulSoup is installed and imported. I edited the response for the import. – Anas Nov 11 '13 at 12:44
1

for version 4 use 'from bs4 import BeautifulSoup' – shao.lo Aug 22 '16 at 15:49

score 1 · Answer 2 · answered Feb 06 '18 at 14:27

If you have lxml and cssselect python packages installed:

from lxml import html
def parse_form(form):
    tree = html.fromstring(form)
    data = {}
    for e in tree.cssselect('form input'):
        if e.get('name'):
            data[e.get('name')] = e.get('value')
    return data

score 1 · Answer 3 · edited Dec 04 '19 at 22:47

1

If you have attribute name and value, you can search

from BeautifulSoup import BeautifulStoneSoup
xml = '<person name="Bob"><parent rel="mother" name="Alice">'
xmlSoup = BeautifulStoneSoup(xml)

xmlSoup.findAll(name="Alice")
# []

edited Dec 04 '19 at 22:47

CivFan

13,560
9
41
58

answered Aug 02 '11 at 11:17

Kracekumar

19,457
10
47
56

score 1 · Accepted Answer · answered Aug 02 '11 at 12:19

Doing this kind of parsing would also be quite easy using lxml (which i personally prefer over BeautifulSoup because of its Xpath support). For example, the following snippet would print all fields names (if they have one) which belong to forms named "form2":

# you can ignore this part, it's only here for the demo
from StringIO import StringIO
HTML = StringIO("""
<html>
<body>
    <form name="form1" action="/foo">
        <input name="uselessInput" type="text" />
    </form>
    <form name="form2" action="/bar">
        <input name="firstInput" type="text" />
        <input name="secondInput" type="text" />
    </form>
</body>
</html>
""")

# here goes the useful code
import lxml.html
tree = lxml.html.parse(HTML) # you can pass parse() a file-like object or an URL
root = tree.getroot()
for form in root.xpath('//form[@name="form2"]'):
    for field in form.getchildren():
        if 'name' in field.keys():
            print field.get('name')

This is not so good, it only looks at immediate children of the form element and does not check whether they are form inputs (other elements may also have name attributes). — janek37, Jun 21 '17 at 11:35

Extracting Fields Names of an HTML form - Python

4 Answers4

Linked