How to regex url with two groups?

Question

I got this url: s3://dev-datalake-cluster-bucket-q37evqefmksl/raw/wfm/users.11315

I need to have the following values:

dev-datalake-cluster-bucket-q37evqefmksl
/raw/wfm/users.11315

I tried so far this code below, but it keeps throwing me errors -

pattern = re.compile('s3://(?)/(?)', response_content)
print ( re.match(pattern, response_content) )

See [How can I split a url string up into separate parts in Python?](http://stackoverflow.com/a/449811/3832970). — Wiktor Stribiżew, Mar 22 '17 at 11:32
`(?)` looks like you have vaguely guessed how regexes might work and now that your guess has failed you've given up and are asking for help. Read a tutorial or some documentation on regex. — Alex Hall, Mar 22 '17 at 11:32

anubhava · Accepted Answer · 2017-03-22T11:34:52.963

1

You can use a negated character class to grab this value using:

^s3://([^/]+)/(.*)

Your value is returned by captured group #1

Code:

>>> s = 's3://dev-datalake-cluster-bucket-q37evqefmksl/raw/wfm/users.11315'

>>> print re.findall(r'^s3://([^/]+)/(.*)', s)
[('dev-datalake-cluster-bucket-q37evqefmksl', 'raw/wfm/users.11315')]

RegEx Demo

Regex Breakup:

^ - Line start
s3:// - Match literal s3://
([^/]+) - Match 1 or more of any character that is not /
/ - Match literal /
(.*) - Match rest

edited Mar 22 '17 at 11:34

answered Mar 22 '17 at 11:31

anubhava

761,203
64
569
643

1

What about the second part? – Alex Hall Mar 22 '17 at 11:33

score 0 · Answer 2 · answered Mar 22 '17 at 11:37

0

You can use re.groupdict

>>> re_match = re.match(r's3://(?P<bucket>[^/]+)/(?P<item_path>.*)', s)
>>> re_match.groupdict()
{'bucket': 'dev-datalake-cluster-bucket-q37evqefmksl', 'item_path': 'raw/wfm/users.11315'}

Pythex is a handy resource for regex.

answered Mar 22 '17 at 11:37

shad0w_wa1k3r

12,955
8
67
90

How to regex url with two groups?

2 Answers2