1

Consider this regex:

<a href(="(?:/user)?/([^"]+))">

What i want is that if in the second capturing group if there is all/only digits then this regex should not match. An example:

<a href="/user/15594243">
#this should not match

Any solution for that? I want a regex solution only, i know i can achieve this by using further python code.

gpoo
  • 8,408
  • 3
  • 38
  • 53
Aamir Rind
  • 38,793
  • 23
  • 126
  • 164

5 Answers5

2

Negative lookahead assertion for all numbers and a quote is all that I think is needed"

user_re = re.compile('<a href(="/(?!(?:user/)?[0-9]+").+)"')

In [74]: [(url,user_re.match(url) and user_re.match(url).group(1)) for url in 
                 ['<a href="/user/15594243">',
                  '<a href="/user/15594243_">',
                  '<a href="/user/user15594243">',
                  '<a href="/user/1">',
                  '<a href="/user/15594243/add">',
                  '<a href="/item/15594243">',
                  '<a href="/a"',
                  '<a href="/15594243">']]
Out[74]: 
[('<a href="/user/15594243">', None),
 ('<a href="/user/15594243_">', '="/user/15594243_'),
 ('<a href="/user/user15594243">', '="/user/user15594243'),
 ('<a href="/user/1">', None),
 ('<a href="/user/15594243/add">', '="/user/15594243/add'),
 ('<a href="/item/15594243">', '="/item/15594243'),
 ('<a href="/a"', '="/a'),
 ('<a href="/15594243">', None)]

EDIT: I know my last edit does the regex twice but that is just for display purposes.

Phil Cooper
  • 5,747
  • 1
  • 25
  • 41
  • 1
    in your regex you have `/user/` while it is `(?:/user)?/` – Aamir Rind Jul 15 '12 at 17:07
  • @Aamir Adnan , sure I'll tweek it but I wasn't sure about the exact requirements. I see you were even including the `=` in the return group. I'll edit to match your original in those regards. – Phil Cooper Jul 15 '12 at 17:10
  • @AamirAdnan OK I've edited it. The opening `"` is included but the closing `"` is not as in your original but you can take it from here I think. – Phil Cooper Jul 15 '12 at 17:25
  • i think you are missing the point that `/user` is optional in my regex, in your regex this `''` case will still match which should not – Aamir Rind Jul 15 '12 at 17:41
  • @AamirAdnan I did miss that. For SO questions like this, the best questions have a set of inputs and your expected output. If you can do that, it eliminates any ambiguity as to what you are looking for. It also forces you to think in terms of [TDD](http://www.agiledata.org/essays/tdd.html). Hope this works now – Phil Cooper Jul 15 '12 at 17:55
0

What about

<a href(="(?:/user)?/([^"/]*?[^0-9"/][^"/]*?))">

? We need to include the /, because if not it omits /user as it's optional, and takes user/ as the non numeric thing...

user1494736
  • 2,425
  • 16
  • 8
0

Use this for the second capturing group.

\d*[a-zA-Z]+[a-zA-Z0-9]*

This allows you to start with a number if you want, require at least one alphabet and follow it up with an alphanumeric if you want.

Community
  • 1
  • 1
Mendhak
  • 8,194
  • 5
  • 47
  • 64
0

You can use assertions. Lookbehind assertion won't work as it requires fixed width, so let's use lookahead.

reg = re.compile("<a href=\"(?:/user)?/(?![0-9]+)([^\"/]+)\">")

This will work. But this regular expression makes invalid those urls: /user/test/u345, /user/t/user (slash is not allowed). That's because your /user part is optional: without an assumption of ([^"/]) , [^"] consumed everything (/user/45)

madfriend
  • 2,400
  • 1
  • 20
  • 26
-2

This'll do it, replace ([^"]+) with:

([^"]*?[^0-9"][^"]*?)

edit:Unless python is Quaint with a capital Q I genuinly don't know what y'all see wrong. From the javascript console this works:

>>> 'user/user1234"'.match(/\/([^"]*?[^0-9"][^"]*?)"/);
Array ["/user1234"", "user1234"]
>>> 'user/1234"'.match(/\/([^"]*?[^0-9"][^"]*?)"/);
null

So, are you telling me this isn't the case in Python? Why?

edit2: aha, the optional /user fouls the results.... this'll prevent it:

 <a href(="(?:/user)?/(?!user/)([^"]*?[^0-9"][^"]*?))">
Wrikken
  • 69,272
  • 8
  • 97
  • 136