regular expression no hits in python

Question

I have the following regular expression

(?<=<TEXT>).*?(?=</TEXT>)

which is supposed to find anything between <TEXT> and </TEXT>.

I paste my string on http://pythex.org/ and it does work, but the following implementation in python does not find anything

import re
re.findall(r'(?<=<TEXT>).*?(?=</TEXT>)', text)

where text contains what I pasted into the window there (used the debugger, pasted output of variable). Do I need to pay attention to something special?

Some additional output

>>> pattern = re.compile(r"(?<=<TEXT>).*?(?=</TEXT>)")
>>> print(pattern)
re.compile('(?<=<TEXT>).*?(?=</TEXT>)')
>>> re.DOTALL
16
>>> pattern.findall(text)
[]

Your code doesn't even work. `enc` is an invalid argument for `open` and it seems that `file` is a filename. — vaultah, Feb 02 '16 at 16:40
Maybe there are linebreaks between the opening and closing tags? Did you expect `re.DOTALL` to globally activate dotall-mode? You have to [pass the flag to the function](https://docs.python.org/3/library/re.html#re.compile). — tobias_k, Feb 02 '16 at 16:51

FooBar · Answer 1 · 2016-02-02T17:18:07.933

0

I get the "correct" output with

re.findall(r'(?<=<TEXT>).*?(?=</TEXT>)', text, re.DOTALL)

I assumed the default value in re to be the same with pythex, which it apparently is not.

edited Feb 02 '16 at 17:18

answered Feb 02 '16 at 16:54

FooBar

15,724
19
82
171

Dropping the ? after the .* changes the behaviour to greedily include everything from the first to the last . That said, it works for me with `re.findall(r'(?<=).*?(?=)', text, re.DOTALL)` – F1Rumors Feb 02 '16 at 17:11

score 0 · Answer 2 · edited May 23 '17 at 12:23

0

It looks like you really ought to be considering a token parser rather than regular expressions - is this an xml or html input? In that case, the you might want to consider this question & the top answer here: How Do I Parse XML in Python

edited May 23 '17 at 12:23

Community

1
1

answered Feb 02 '16 at 17:13

F1Rumors

920
9
13

regular expression no hits in python

2 Answers2