I have been using a regex that searches a document for all URLS and replaces them but now I want to only replace the hostname, not the subdomain or any other part of the URL.
For example I want https://ftp.website.com > https://ftp.mything.com
This is a tool I am writing to sanitize documents and am fairly new to some of this. Any help would be greatly appreciated. Thanks!
This is my quick and dirty find and replace so far:
import fileinput
import re
for line in fileinput.input():
line = re.sub(
r'^(?:http:\/\/|www\.|https:\/\/)([^\/]+)',
r'client.com', line.rstrip())
line = re.sub(
r'\b(\d{1,3}\.){2}\d{1,3}\b',
r'1.33.7', line.rstrip())
print(line)
I realize that URL parse can accomplish this but I want this to find the URLs in the document and I do not want to supply them. Maybe I just need help using regex to find the urls and passing that to urlparse to remove the parts I want. Hope this clarifies.