Python Regex URL from Website / Raw Website without Https & Http

Question

I have python code like this

#! /usr/bin/python
from url parse import urlparse
url = 'https://pastebin.com/raw/EgGZmEqY'
parsed = urlparse(url)
site = parsed.netloc
print site

I want if the site is RAW or NOT just Grabbing the site without HTTPS and HTTP or WWW. For Example i have website like this from RAW. I want to get the URL just example.com without

https://example.com
http://example.com
www.example.com
example.com

How to get without https,http and www ? Thank you!

kenjoe41 · Answer 1 · 2018-09-15T11:29:53.790

1

I take it that you just want the TLD (domain name) without the subdomains or scheme.

From this Stackoverflow answer, seems all you need is:

import tldextract
tldextract.extract('http://forums.news.cnn.com/')
ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')

In your case then, i would use this: #!/usr/bin/env python3

import tldextract

url = 'https://www.pastebin.co.uk/raw/EgGZmEqY'

parsed = tldextract.extract(url)
domain = parsed.domain + '.' + parsed.suffix



print (domain)

edited Sep 15 '18 at 11:29

answered Sep 15 '18 at 10:16

kenjoe41

280
2
8

1

You should provide code which works with the OP's exact data. Cutting and pasting from another question doesn't help much. – Tim Biegeleisen Sep 15 '18 at 10:18
But that just for one domain .. how i want grab it from raw / another website ? like in my pastebin link. – Rai Sep 15 '18 at 11:20

Python Regex URL from Website / Raw Website without Https & Http

1 Answers1