-2
text = 'https://www.nytimes.com/2017/10/09/us/politics/corkers-blast-at-trump-has-other-republicans-nodding-in-agreement.html?rref=collection%2Fsectioncollection%2Fpolitics\r\n'

test = re.findall(r"^http* com$",text)

result i have :

test = [ ]

The output I am expecting would be like this:

www.nytimes.com
Bhargav Desai
  • 941
  • 1
  • 5
  • 17
  • Welcome to Stack Overflow. Don't use the question title for your question description; put that in the description section. Your question title should be a summary of your problem (ideally) stated as a question (refer to [How to Ask](https://stackoverflow.com/help/how-to-ask) for examples and details). Also be sure to include all relevant question tags; as your question is about a non-matching regular expression; including the `regex` tag would have been a good choice to categorize your question further. – Ivo Mori Jul 28 '20 at 05:27
  • 1
    If you need a more reliable way to split URL to parts it is better to use `urllib.parse` module instead of regular expressions. Check this [answer](https://stackoverflow.com/a/56476496/6682517). – Sergey Shubin Jul 28 '20 at 08:43

3 Answers3

1

Your regex pattern is wrong. There shouldn't be any space in the pattern, replace * by .* and avoid anchoring your regex with a $ in the end. Try this

>>> re.findall(r"^http.*?com", text)
['https://www.nytimes.com']
Prem Anand
  • 2,469
  • 16
  • 16
0

You can also try this :

test = re.findall(r"www.+com",text)

Output :

['www.nytimes.com']
Bhargav Desai
  • 941
  • 1
  • 5
  • 17
0

This will match http or https and also any type of domain (.gov.us, .com.de, .edu...)

test = re.findall(r"^http.*\:\/\/(.*?)\/",text)

^http = Begning with http

"^http.* = will match for http or https

\:\/\/ = escape ://

(.*?) = match group a.k.a what you want (without the ? will match until the last /)

\/ = first ocurrence of "/"

Joao Vitorino
  • 2,976
  • 3
  • 26
  • 55