2

I'm having a hard time grasping regex no matter how much documentation I read up on. I'm trying to match everything between a a string and the first occurrence of & this is what I have

link =  "group.do?sys_id=69adb887157e450051e85118b6ff533c&&"
rex = re.compile("group\.do\?sys_id=(.?)&")
sysid = rex.search(link).groups()[0]

I'm using https://regex101.com/#python to help me validate my regex and I can kinda get rex = re.compile("user_group.do?sys_id=(.*)&") to work but the .* is greedy and matches to the last & and im looking to match to the first &

I thought .? matches zero to 1 time

Brian Smith
  • 117
  • 8

3 Answers3

7

You don't necessarily need regular expressions here. Use urlparse instead:

>>> from urlparse import urlparse, parse_qs 
>>> parse_qs(urlparse(link).query)['sys_id'][0]
'69adb887157e450051e85118b6ff533c'

In case of Python 3 change the import to:

from urllib.parse import urlparse, parse_qs
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • I've been on here too much, I didn't realize that was part of a url. Nice catch! – Brian Jun 13 '16 at 19:51
  • so is urlparse preferred over regex? – Brian Smith Jun 13 '16 at 19:53
  • 1
    @BrianSmith generally speaking, URLs can be quite complicated and, strictly speaking parsing or validating it with regular expressions can be as hard as [that](http://stackoverflow.com/a/835527/771848). In other words, regular expressions is too general of a tool to apply to URL parsing. `urlparse` is specifically designed for that. – alecxe Jun 13 '16 at 19:58
  • I agree with @alecxe, his solution is much better for this task – Brian Jun 13 '16 at 20:01
2

You can simply regex out to the &amp instead of the final & like so:

import re
link =  "user_group.do?sys_id=69adb887157e450051e85118b6ff533c&&"
rex = re.compile("user_group\.do\?sys_id=(.*)&&")
sysid = rex.search(link).groups()[0]

print(sysid)
Brian
  • 1,659
  • 12
  • 17
2
.* 

is greedy but

.*? 

should not be in regex.

.? 

would only look for any character 0-1 times while

.*? 

will look for it up to the earliest matching occurrence. I hope that explains it.

Eric
  • 21
  • 2