I am looking to create a regex in python in order to extract ONLY the domains from the following the set of URLs at the bottom of this post. I have been using https://regexr.com/ in order to test out my regex before applying Series.str.extract()
. So far, I have been able to get VERY close, but it looks like the first character (the first 'w' in www, where there is one included) is not being captured. The regex I have so far is this:
[^\/\/](\w*.\w*.com|\w*.\w*.org|\w*.\w*.cc|\w*.\w*.ly)
How can I modify this to go from http://css-cursor.techstream.org
to only css-cursor.techstream.org
'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
'http://www.interactivedynamicvideo.com/',
'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0',
'http://evonomics.com/advertising-cannot-maintain-internet-heres-solution/',
'HTTPS://github.com/keppel/pinn',
'Http://phys.org/news/2015-09-scale-solar-youve.html',
'https://iot.seeed.cc',
'http://www.bfilipek.com/2016/04/custom-deleters-for-c-smart-pointers.html',
'http://beta.crowdfireapp.com/?beta=agnipath',
'https://www.valid.ly?param',
'http://css-cursor.techstream.org'