Unable to get the full match in Regex

Asked Jan 25 '18 at 10:04

Active Jan 25 '18 at 19:29

Viewed 25 times

I am trying to find the complete Github url from text. But it returns only the first match and not the complete URL. I tested my regex on https://pythex.org/ and its shows the correct match result.

test = 'https://www.github.com/whoisthere'
GITHUB_PATTERN = r"(http(s?):\/\/|[a-zA-Z0-9\-]+\.|[github])[github\/~\-]+\.[a-zA-Z0-9\/~\-_,&=\?\.;]+[^\.,\s<]"
GITHUB_REGEX = re.compile(GITHUB_PATTERN,re.IGNORECASE)
github_regex_result = re.findall(GITHUB_REGEX,test)

if len(github_regex_result) > 0:
    print("GITHUB : {}".format(github_regex_result[0]))
else:
    print(None)

It returns me the following

GITHUB : ('https://', 's')

While I am trying to get the complete url like

GITHUB : ('https://www.github.com/whoisthere')

New screenshot of issue

edited Jan 25 '18 at 19:29

asked Jan 25 '18 at 10:04

joel

1,156
3
15
42

1

Change all capturing groups to non-capturing. – Wiktor Stribiżew Jan 25 '18 at 10:06
Here is the updated regex - GITHUB_PATTERN = r"(?:http(s?):\/\/|[a-zA-Z0-9\-]+\.|[github])[github\/~\-]+\.[a-zA-Z0-9\/~\-_,&=\?\.;]+[^\.,\s<]" – joel Jan 25 '18 at 10:34
@WiktorStribiżew While this issue is fixed, but due to boolean OR, it ignores the github in the URL, how can I fix this – joel Jan 25 '18 at 13:39
What boolean OR? – Wiktor Stribiżew Jan 25 '18 at 13:42
I have updated the question with the screenshot. When i search for twitter, it finds the other urls as well – joel Jan 25 '18 at 19:30
`[twitter]` matches a single letter out of the set: `t`, `w`, `i`, `e`, `r`. You need to write it without `[...]` – Wiktor Stribiżew Jan 25 '18 at 20:38
Thanks @WiktorStribiżew. Got that fixed – joel Jan 26 '18 at 05:54

Unable to get the full match in Regex

0 Answers0