How to right php regex to extract hidden fiedls name , value pair from html ?
Asked
Active
Viewed 768 times
1
-
You don't regex HTML - you parse it. – zellio Jul 27 '11 at 16:55
-
but this code don't work as html is mailformed function getHidden($formAsString) { $hidEles=""; $doc=new DOMDocument(); $doc->loadHTML($formAsString); $xpath=new DOMXPath($doc); $query="//input[@type='hidden']"; $hidData=$xpath->query($query); foreach($hidData as $field) { //type cast the value to string $name=(string) $field->getAttribute('name'); $value=(string) $field->getAttribute('value'); $hidEles[$name]=$value; } return $hidEles; } – Vidya Jul 27 '11 at 16:56
2 Answers
4
As per usual, DO NOT USE REGEX TO PROCESS HTML
Use dom:
$dom = new DOMDocument;
$dom->loadHTML('your html here');
$xp = new DOMXPath($dom);
$hidden = $xp->query("//input[@type='hidden']");
for ($i = 0; $i < $hidden->length; $i++) {
echo $hidden[$i]->getAttribute('name');
}
EDIT: Just saw your comment about the malformed html: Use HTMLPurifier to clean up the HTML. Hopefully it's not so mangled that Purifier can't clean it up to a state that DOM will accept it.

Marc B
- 356,200
- 43
- 426
- 500
-
hi I already have same type of code but I am getting lot of error like Warning: DOMDocument:: loadHTML () [ domdocument.loadhtml ]: Unexpected end tag: a in Entity, line: 660 in C: \ wamp \ www \ curl2.php on line 8 Call Stack # Time Memory Function Location 1 0.0006 706192 {Main} () .. \ Curl2.php : 0 2 12.0399 814832 getHidden () .. \ Curl2.php : 41 3 12.0400 815640 DOMDocument-> loadHTML () – Vidya Jul 27 '11 at 16:59
-
1It would be nice to have a `DO NOT USE REGEX TO PROCESS HTML` button. All it would do would be to paste that phrase. – cwallenpoole Jul 27 '11 at 16:59
-
2@cwallenpoole: automatically closing any question that has "regex" and "htmL" in the text would be nice. – Marc B Jul 27 '11 at 17:00
-
-
With 40k rep, can't you more-or-less do things like that? You're double the magical number – cwallenpoole Jul 27 '11 at 17:12
-
@cwallenpoole: Nope, I can't auto-close, only vote to close. Not sure what the required rep is for insta-close. – Marc B Jul 27 '11 at 17:13
-
@Vidya: http://stackoverflow.com/questions/1148928/disable-warnings-when-loading-non-well-formed-html-by-domdocument-php – Marc B Jul 27 '11 at 17:14
0
Your problem is that you actually have invalid HTML, DomDocument is not able to parse. You can go around that by repairing it first, for example with the Tidy PHP extension, it's very easy:
$html = 'you HTML here';
$html = tidy_repair_string($html);