0

date = re.search(r'([\x\d\w-.\s,()&\"]+|)

I am migrating a code from PHP to Python, and am using this piece of regex expression on re.match, which doesn't work, giving a python error of:

raise error, v # invalid expression

It works on PHP's preg_match, and also http://www.gskinner.com/RegExr , any idea why this is happening? Thanks!

nubela
  • 1
  • 24
  • 75
  • 123

1 Answers1

3
\x

on its own is invalid (both in PHP and Python, but perhaps PHP just ignores it while Python throws an exception). Try removing it, and also moving the - to the end of the character class:

date = re.search(r'<td>([\d\w.\s,()&\"-]+|)<br><font',page_data)

But in all cases, you won't get very happy if you try parsing HTML with regular expressions.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 1
    RE: Parsing X?HTML with regexes: [DON'T DO IT](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – Hank Gay May 26 '10 at 17:49