For your particular input, you could use str.partition()
or str.split()
:
print('youtube.com/video/AiL6nL'.partition('/')[0])
# -> youtube.com
Note: urlparse
module (that you could use in general to parse an url) doesn't work in this case:
import urlparse
urlparse.urlsplit('youtube.com/video/AiL6nL')
# -> SplitResult(scheme='', netloc='', path='youtube.com/video/AiL6nL',
# query='', fragment='')
In general, it is safe to use a regex here if you know that all lines start with a hostname and otherwise each line contains a well-formed uri:
import re
print("\n".join(re.findall(r"(?m)^\s*([^\/?#]*)", text)))
youtube.com
yahoo.com
youtube.com
google.com
youtube.com
Note: it doesn't remove the optional port part -- host:port
.