Python Regex to find whitespace, end of string, and/or word boundary

Question

I am using re in python 2.7.5 for regex. I am trying to have it match foobar.com/1, `foobar.com/12, foobar.com/123, or foobar.com/1324, but not foobar.com/ or foobar.com/12345.

My current regex is foobar\.com/\d\d?\d?\d?\W, but this will only match strings that have a non-word, non-whitespace, non-end-of-string character after the desired string.

How do I make it match strings with any character except for an alpha-numeric?

Code:

pattern1 = re.compile("foobar\.com/\d\d?\d?\d?\W")
match = pattern1.search(comment.body)
print match

Input:

foobar.com/12345

random text

[relevant](http://foobar.com/1319)

foobar.com/567

other comment

random comment

foobar.com/1302/

foobar.com

foobar.com/201

This is a test

You are looking at VI model 1.7 AGB Commander Shepard. Please see a store clerk to unlock a demo of this model.

Listen, if you don't have the credits just...tear me out of the terminal. Or somehting.

I sound seven percent more like Commander Shepard than any other bootleg VI copy.

SHEPHERDVI

SHEPARDVI

shepherdvi

You want help solving your problems? Get me out of this damn demo mode.

Shepard VI

Hey it works

Commander Shepard. Allicance Navy.

Commander Shepard. Allicance Navy.

TestShepard

TestShepard

Onelasttest

I sound seven percent more like Commander Shepard than any other bootleg VI copy.

(Strings separated by double new line, strings #3, 4, 7, and 9 should match.)

Output:

None
None
<_sre.SRE_Match object at 0x103f1a578>
None
None
None
<_sre.SRE_Match object at 0x103f1a578>
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None

`foobar\.com/\d{1,4}\b`. Note that you need to escape the dot. `\b` is an assertion that matches the empty string at a word boundary. — Tim Peters, Jan 23 '14 at 05:39

score 2 · Answer 1 · answered Jan 23 '14 at 05:41

2

foobar\.com/\d{1,4}\b

Will do the trick.

answered Jan 23 '14 at 05:41

Newb

2,810
3
21
35

it appears not. with `pattern.search()` I am getting 0 matches, while with my regex in the question, I get 2 matches out of 4 desired – ZuluDeltaNiner Jan 23 '14 at 05:44
@ZuluDeltaNiner, that's not helpful: **edit your question** to show exactly the code you ran, the output you got, and the output you want. Nobody can read your mind ;-) – Tim Peters Jan 23 '14 at 05:45

score 2 · Accepted Answer · answered Jan 23 '14 at 06:02

2

... or you could use the negative lookahead (?!...) to make sure there is not a fifth digit.

>>> re.findall(r'foobar[.]com/\d{1,4}(?!\d)', comment.body)
['foobar.com/1319', 'foobar.com/567', 'foobar.com/1302', 'foobar.com/201']

answered Jan 23 '14 at 06:02

dnozay

23,846
6
82
104

Python Regex to find whitespace, end of string, and/or word boundary

2 Answers2

Linked