1

I'm modifying a PHP script that I have and it is currently outputting a nested form. Something like:

<form name="input" action="html_form_action.asp" method="get">
<p>stuff here here, this may or may not be in a div, script, etc..</p>
<form name="input" action="html_form_action.asp" method="get">
<div>stuff here possibilly</div>
Username: <input type="text" name="user" />
<input type="submit" value="Submit" />
</form> 
<p>otherstuff this may or may not be in a div, script, etc..</p>
</form> 

Nested form's are a no-no (IE hates them and basically causes the form to stop working), so I need to remove the nested form lines, but not the form items. I need to remove the nested:

<form name="input" action="html_form_action.asp" method="get">

and

</form> 

but not the outer <form and </form> or the input or submit stuff.

Is this possible to do with regex?

Note, the reason I just want to regex out the form rather than find the problem is because I know it will take some significant re-working to get rid of the double form... the regex solution is quick for now.

Kobi
  • 135,331
  • 41
  • 252
  • 292
that0n3guy
  • 577
  • 4
  • 13
  • 2
    I think you're approaching this wrong. Don't go putting out fires of a defective script. Fix the script itself. – Madara's Ghost Mar 31 '12 at 15:28
  • 1
    It cannot be done using regular expressions. Regular expressions cannot handle nesting. You can write a parser quickly for that specific task. – mrk Mar 31 '12 at 15:35
  • 1
    You can't parse HTML with regex http://stackoverflow.com/a/1732454/477127 – GordonM Mar 31 '12 at 15:39
  • @Truth, I know that, I even stated as much in my note. The script isn't just a script... its drupal, with ubercart so simply "fixing" my particular problem is a longer thing. I needed a fix today and regex will do it I think. But you are right and it will get fixed the right way, just not today. – that0n3guy Apr 01 '12 at 01:11

1 Answers1

3

that wasn't easy on but here's the code

preg_replace('@(<form[^<>]+>)((.|[\r\n])*)(<form[^<>]+>)((.|[\r\n])*)(</form>)((.|[\r\n])*)(</form>)@','$1$2$5$8$10',$html);
abugnais
  • 188
  • 1
  • 8
  • this works only for 1 form inside another,i did not test it for more complex cases,good luck – abugnais Mar 31 '12 at 17:26
  • 1
    I'll give you a +1 for the effort, but you should know that HTML shouldn't be parsed with RegEx. – Madara's Ghost Mar 31 '12 at 17:31
  • @Truth the question was specifically tagged as regex question,but if you have a better method please share it :) – abugnais Mar 31 '12 at 17:44
  • Awesome, I'll give this a try. For my use, I will never have more than one within the other. I'll post back and let ya know how it goes. – that0n3guy Apr 01 '12 at 01:12