I'm writing a Python code that would process a block of text which, among the text useless for me, features URLs. Out of the text block I only need the domains, not the full URLs. Example input:
47.91.158.176 or 54.145.185.110 port 80 - gooolgeremf.top - GET /search.php
47.90.205.113 or 35.187.59.173 port 80 - voperforseanx.top/site/chrome_update.html
So here I need only gooolgeremf.top
and voperforseanx.top
matched but the regex I've written will also match search.php
and chrome_update.html
.
What I'm thinking is that the regex should stop matching after /
. However I don't know how to implement it and especially how to not prevent matching domains that appear after the first /
in the whole text file.
The way it works so far in my code:
regexdm="[A-Za-z0-9]{1,}\.[A-Za-z0-9]{1,10}\.?[A-Za-z]{1,}\.?[A-Za-z]{1,}"
dmsc=re.findall(regexdm, iocsd.read())