Python2 regular expressions seem faulty

Question

Using Python 2.7.3 on Linux. Here is a shell session verbatim.

>>> f = open("feed.xml")
>>> text = f.read()
>>> import re
>>> regexp1 = re.compile(r'</?item>')
>>> regexp2 = re.compile(r'<item>.*</item>')
>>> regexp1.findall(text)
['<item>', '</item>', '<item>', '</item>', '<item>', '</item>', '<item>', '</item>']
>>> regexp2.findall(text)
[]

Is this a bug, or is there something I'm not understanding about Python regular expressions?

score 5 · Accepted Answer · answered Jul 30 '12 at 15:39

5

By default, '.' does not match a newline. Try with

regexp2 = re.compile(r'<item>.*</item>', re.DOTALL)

answered Jul 30 '12 at 15:39

chepner

497,756
71
530
681

score 0 · Answer 2 · edited May 23 '17 at 11:48

0

Here is the best answer to this question: Don't use regular expressions to parse non-regular languages such as XML. It drove one S-O user insane. Another relevant link.

edited May 23 '17 at 11:48

Community

1
1

answered Jul 30 '12 at 15:37

Claudiu

224,032
165
485
680

2

This doesn't address his misunderstanding of regular expressions, however. – chepner Jul 30 '12 at 15:43
A valid point, but I'm only using this code for a quick hack and thus don't want or need to learn any new APIs. – Jangler Jul 30 '12 at 15:48
I finally followed the link to the insane S-O user. I'd retract my downvote for that if I could :) – chepner Jul 30 '12 at 16:00
@chepner: made a trivial (whitespace only) edit so you can retract the downvote. – Fred Foo Jul 30 '12 at 16:50
@Jangler: quick hacks often become scripts that you rely on. if you learn the new API then you can do a quick hack with the new API – Claudiu Jul 30 '12 at 20:02

Python2 regular expressions seem faulty

2 Answers2