I am trying to extract a list of domain names from a httrack data stream using grep. I have it close to working, but the result also includes any and all sub-domains.
httrack --skeleton http://www.ilovefreestuff.com -V "cat \$0" | grep -iEo "([0-9,a-z\.-]+)\.(com)"
Here is my current example result:
- domain1.com
- domain2.com
- www.domain3.com
- subdomain.domain4.com
- whatever.domain5.com
Here is my desired example result.
- domain1.com
- domain2.com
- domain3.com
- domain4.com
- domain5.com
Is there something I can add to this grep expression, or should I pipe it to a new sed expression to truncate any subdomains? And if so, how do I accomplish this task? I'm stuck. Any help is much appreciated.
Regards,
Wyatt