I would like to capture 3 groups(protocol,domain,page path)from this URL: http://www.interactivedynamicvideo.com/
I made this regex pattern pattern = r‘(\w+)://([\w.-]+)/?(.+)’
. Then, since this URL is one of data of my series, I used series.str.extract(pattern) to capture groups. I expected to get http for group 1, www.interactivedynamicvideo.com for group 2, and nothing for group 3. However, I got /
in group 3. I thought that /
is matched at /?
. Could someone explain why /
is included in (.+)
instead of being matched at /?
?
Thank you for your time for this