re.findall -> RegEx in Python

Question

import regex
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = regex.findall(r"/((http[s]?:\/\/)?(www\.)?(gamivo\.com\S*){1})", frase) 
print(x)

Result:

[('www.gamivo.com/product/sea-of-thieves-pc-xbox-one', '', 'www.', 'gamivo.com/product/sea-of-thieves-pc-xbox-one'), ('www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr', '', 'www.', 'gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]

I want something like:

[('https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]

How can I do this?

Remove the first `/` and use non-capturing groups. `r'(?:https?://)?(?:www\.)?gamivo\.com\S*'`, see [this demo](https://regex101.com/r/phCIEr/1). — Wiktor Stribiżew, Jul 23 '21 at 09:16
do u really need regex for this ? split on spaces and take the ones with https in the resulting array — leoOrion, Jul 23 '21 at 09:17
@leoOrion yes it's for a more bigger project that needs a regex. So in final project I will replace with str.replace() to use a shorted link — Diego, Jul 23 '21 at 09:22

score 1 · Answer 1 · answered Jul 23 '21 at 09:23

You need to

Remove the initial / char that invalidates the match of https:// / http:// since / appears after http
Remove unnecessary capturing group and {1} quantifier
Convert the optional capturing group into a non-capturing one.

See this Python demo:

import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
print( re.findall(r"(?:https?://)?(?:www\.)?gamivo\.com\S*", frase) )
# => ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']

See the regex demo, too. Also, see the related re.findall behaves weird post.

score 0 · Answer 2 · answered Jul 23 '21 at 12:43

Try this, it will take string starting from https to single space or newline.

import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = re.findall('(https?://(?:[^\s]*))', frase)
print(x)
# ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']

re.findall -> RegEx in Python

2 Answers2