Matching a second group under specific conditions (using one single regex)

Question

Consider this regex:

<a href(="(?:/user)?/([^"]+))">

What i want is that if in the second capturing group if there is all/only digits then this regex should not match. An example:

<a href="/user/15594243">
#this should not match

Any solution for that? I want a regex solution only, i know i can achieve this by using further python code.

*Why* do you want to do this in a single regular expression without using a second expression or other code? — Todd A. Jacobs, Jul 15 '12 at 15:07
@CodeGnome please assume i have no other option rather than this single regex.. — Aamir Rind, Jul 15 '12 at 15:11

Phil Cooper · Accepted Answer · 2012-07-15T17:50:10.950

2

Negative lookahead assertion for all numbers and a quote is all that I think is needed"

user_re = re.compile('<a href(="/(?!(?:user/)?[0-9]+").+)"')

In [74]: [(url,user_re.match(url) and user_re.match(url).group(1)) for url in 
                 ['<a href="/user/15594243">',
                  '<a href="/user/15594243_">',
                  '<a href="/user/user15594243">',
                  '<a href="/user/1">',
                  '<a href="/user/15594243/add">',
                  '<a href="/item/15594243">',
                  '<a href="/a"',
                  '<a href="/15594243">']]
Out[74]: 
[('<a href="/user/15594243">', None),
 ('<a href="/user/15594243_">', '="/user/15594243_'),
 ('<a href="/user/user15594243">', '="/user/user15594243'),
 ('<a href="/user/1">', None),
 ('<a href="/user/15594243/add">', '="/user/15594243/add'),
 ('<a href="/item/15594243">', '="/item/15594243'),
 ('<a href="/a"', '="/a'),
 ('<a href="/15594243">', None)]

EDIT: I know my last edit does the regex twice but that is just for display purposes.

edited Jul 15 '12 at 17:50

answered Jul 15 '12 at 16:43

Phil Cooper

5,747
1
25
41

1

in your regex you have `/user/` while it is `(?:/user)?/` – Aamir Rind Jul 15 '12 at 17:07
@Aamir Adnan , sure I'll tweek it but I wasn't sure about the exact requirements. I see you were even including the `=` in the return group. I'll edit to match your original in those regards. – Phil Cooper Jul 15 '12 at 17:10
@AamirAdnan OK I've edited it. The opening `"` is included but the closing `"` is not as in your original but you can take it from here I think. – Phil Cooper Jul 15 '12 at 17:25
i think you are missing the point that `/user` is optional in my regex, in your regex this `''` case will still match which should not – Aamir Rind Jul 15 '12 at 17:41
@AamirAdnan I did miss that. For SO questions like this, the best questions have a set of inputs and your expected output. If you can do that, it eliminates any ambiguity as to what you are looking for. It also forces you to think in terms of [TDD](http://www.agiledata.org/essays/tdd.html). Hope this works now – Phil Cooper Jul 15 '12 at 17:55

score 0 · Answer 2 · answered Jul 15 '12 at 15:14

0

What about

<a href(="(?:/user)?/([^"/]*?[^0-9"/][^"/]*?))">

? We need to include the /, because if not it omits /user as it's optional, and takes user/ as the non numeric thing...

answered Jul 15 '12 at 15:14

user1494736

2,425
16
8

score 0 · Answer 3 · edited May 23 '17 at 11:51

0

Use this for the second capturing group.

\d*[a-zA-Z]+[a-zA-Z0-9]*

This allows you to start with a number if you want, require at least one alphabet and follow it up with an alphanumeric if you want.

edited May 23 '17 at 11:51

Community

1
1

answered Jul 15 '12 at 15:14

Mendhak

8,194
5
47
64

It does not match the question. Since when is `/user/--0123` invalid? – Wrikken Jul 15 '12 at 15:22
yes Wrikken is right, your regex works for only alphanumeric, what if there is other symbol? – Aamir Rind Jul 15 '12 at 16:31

madfriend · Answer 4 · 2012-07-15T15:25:44.780

You can use assertions. Lookbehind assertion won't work as it requires fixed width, so let's use lookahead.

reg = re.compile("<a href=\"(?:/user)?/(?![0-9]+)([^\"/]+)\">")

This will work. But this regular expression makes invalid those urls: /user/test/u345, /user/t/user (slash is not allowed). That's because your /user part is optional: without an assumption of ([^"/]) , [^"] consumed everything (/user/45)

Wrikken · Answer 5 · 2012-07-15T15:34:07.307

-2

This'll do it, replace ([^"]+) with:

([^"]*?[^0-9"][^"]*?)

edit:Unless python is Quaint with a capital Q I genuinly don't know what y'all see wrong. From the javascript console this works:

>>> 'user/user1234"'.match(/\/([^"]*?[^0-9"][^"]*?)"/);
Array ["/user1234"", "user1234"]
>>> 'user/1234"'.match(/\/([^"]*?[^0-9"][^"]*?)"/);
null

So, are you telling me this isn't the case in Python? Why?

edit2: aha, the optional /user fouls the results.... this'll prevent it:

 <a href(="(?:/user)?/(?!user/)([^"]*?[^0-9"][^"]*?))">

edited Jul 15 '12 at 15:34

answered Jul 15 '12 at 15:07

Wrikken

69,272
8
97
136

I think `/user/user567` is also valid – madfriend Jul 15 '12 at 15:09
Noop Your regex still matches the case which i posted – Aamir Rind Jul 15 '12 at 15:09
Well, `/user/user567` is mathed positively @madfriend... Did you test this? – Wrikken Jul 15 '12 at 15:10
@AamirAdnan: it does not match `` here. Are you sure you applied the alteration right? What was the full regex you used to test this? – Wrikken Jul 15 '12 at 15:20
Sorry Wrikken but it didn't worked for me in python, i use this as you suggested `` – Aamir Rind Jul 15 '12 at 15:23
AH, 't is because of that weird optional `(?:/user)?` which then matches... Total would be `(="(?:/user)?/(?!user/)([^"]*?[^0-9"][^"]*?))">` then, now I see the problem. – Wrikken Jul 15 '12 at 15:30
@Wrikken Does the last one regex you posted will work for me? – Aamir Rind Jul 15 '12 at 16:32

Matching a second group under specific conditions (using one single regex)

5 Answers5