2

I have a form like this:

<form id="search" method="get" action="search.php">
      <input type="text" name="query" value="Search"/>
      <input type="submit" value="Submit">
</form> 

And i want the values in this oder: method action names

["get", "search.php", ["query"]] 

I don't know how to do it in regex. Because this is also multilined string. I am also very new to regex.

  • You wouldn't do it with regex. Why would you want to do it with regex? [Just don't do it](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – Daniel Roseman Mar 01 '15 at 15:06
  • According to me best way to go with any `xml` parsing module – Vivek Sable Mar 01 '15 at 15:11
  • I would have a read of http://stackoverflow.com/a/1732454/1319998 before trying to parse HTML with regex :-) – Michal Charemza Mar 01 '15 at 15:38

3 Answers3

3

As a proper way for parsing a HTML or XML document you should use a html(or xml) parser like beautifulsoup or lxml or ... . but if you just want to use regex that not be recommended you can use re.findall as following :

>>> [i for j in re.findall(r'method="([^ >"]*)"|action="([^ >"]*)"|name="([^ >"]*)"',s) for i in j if i]
['get', 'search.php', 'query']

[^ >]* match a string that not contain space and >.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
1

I do agree with Michal Charemza's comment to go ahead and read the following post.

I will give an example using Lxml. It's a very powerful tool to parse and analyze HTML.

import lxml
from lxml.html import fromstring

html = fromstring("""<form id="search" method="get" action="search.php">
                     <input type="text" name="query" value="Search"/>
                     <input type="submit" value="Submit">
                     </form> """)
form = html.forms[0] # selecting the first form in the HTML page

# Extracting the data out of the form
print form.action, form.method, form.inputs.keys()

Enjoy,

Abdul

Community
  • 1
  • 1
abdul
  • 134
  • 1
  • 11
0

You could use BeautifulSoup library.

>>> from bs4 import BeautifulSoup
>>> s = '''<form id="search" method="get" action="search.php">
      <input type="text" name="query" value="Search"/>
      <input type="submit" value="Submit">
</form> '''
>>> soup = BeautifulSoup(s)
>>> k = []
>>> for i in soup.find_all('form'):
        k.append(i['method'])
        k.append(i['action'])
        k.append([j['name'] for j in i.find_all('input', attrs={'name':True})])

    
>>> k
['get', 'search.php', ['query']]
Winand
  • 2,093
  • 3
  • 28
  • 48
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • 3
    Why even use `re` here? Just add the name argument to the list as you already are, no need to regex out the name from the element converted to a string... – Jon Clements Mar 01 '15 at 15:45