0

I have html code including elements. What I am trying to do is, I need the whole html code of this form element. For example, in the html code below

...
<p>Sample</p>
<img src="..." />
<form method="post" >
    <input type="hidden" value="v1" id="v1" name="task">
    <input type="hidden" value="v2" name="v2">
    ...
</form>
<div>...</div>
...

I want to extract these codes:

<form method="post" >
    <input type="hidden" value="v1" id="v1" name="task">
    <input type="hidden" value="v2" name="v2">
    ...
</form>

Since I am not so familiar with preg_match expression, I hardly can figure it out. I googled to understand expressions myself, but only could get small portion of grasp.

Can any one help me, please? Best regards.

Andrey Adamovich
  • 20,285
  • 14
  • 94
  • 132
  • 4
    [Thou shalt not use regular expressions to parse (X)HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html). The accepted answer to the linked question should give you the necessary hints. – Linus Kleen Mar 02 '11 at 11:18
  • 1
    @Linus: Don't forget the classic *[You can't parse XHTML with regex](http://stackoverflow.com/questions/1732454)* – user1686 Mar 02 '11 at 11:55
  • @grawity Yes. My all-time favorite. I alternate between this one and the other when taking out the XHTML-regex-whip. – Linus Kleen Mar 02 '11 at 12:05

3 Answers3

2

The regular expession to match the form tag may look like this: "(?smi)<form.*?</form>"

EDIT 1: In PHP the function call will look like this: preg_match('/^.*?<form.*?<\/form>.*$/smi', $data)

EDIT 2: This can be tested here: http://www.spaweditor.com/scripts/regex/index.php

But in general case I wouldn't advise as well to use regular expressions for parsing HTML code.

Andrey Adamovich
  • 20,285
  • 14
  • 94
  • 132
1

For something as trivial as matching a form tag in html, just don't use regular expressions or third party xhtml parsers.

Use the the default DOM Parser instead.

It's as simple as :

// Create a new DOM Document to hold our webpage structure 
$xml = new DOMDocument(); 

// Load the html's contents into DOM 
$xml->loadHTML($html); 

$forms = array(); 

//Loop through each <form> tag in the dom and add it to the $forms array 
foreach($xml->getElementsByTagName('form') as $form) { 
    //Get the node's html string
    $forms[] = $form->ownerDocument->saveXML($form); 
}

where $forms is an array of string of every forms.

Yann Milin
  • 1,335
  • 1
  • 11
  • 22
0

Using regular expressions to handle HTML is generally not a good idea. I'd rather suggest to use a HTML parser. I had good results with this library: http://simplehtmldom.sourceforge.net/

Wukerplank
  • 4,156
  • 2
  • 28
  • 45