-1

I'm having troubles finding the content of HTML forms (or any other tag for that matter). I've tried

    forms = re.findall('<form.*/form>', htmltext)

but with no results. Where's the mistake?

AnotherUser
  • 29
  • 1
  • 4
  • You'd be far better of using a HTML parser; BeautifulSoup is excellent. – Martijn Pieters Jun 03 '14 at 14:35
  • Thanks to both for the advice. I still don't understand why the regexp isn't working though. – AnotherUser Jun 03 '14 at 14:39
  • Never ever ever ever parse html with regex http://blog.codinghorror.com/parsing-html-the-cthulhu-way/ – That1Guy Jun 03 '14 at 14:39
  • Please read http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – timgeb Jun 03 '14 at 14:41
  • 1
    Thanks, those were real eye-opener! But what if the line I posted above (corrected of course) is the only parsing I need in a program? Is is still worth it to import external libraries or use many more lines of code of e.g. HTMLParser? – AnotherUser Jun 03 '14 at 14:48

1 Answers1

0

Unless the form was on one line, that won't work, you need re.DOTALL as an option

forms = re.findall('<form.*/form>', htmltext, re.DOTALL)

You could use re.IGNORECASE | re.DOTALL in case you need to catch something like <Form ...