I have a list of URLs in a file (each line = different Domain) I want to scan (not recursively) and pick two patterns, which are in different lines. After two days of trying - my head is spinning …
That is the important HTML-Part:
<a href="http://subdomain.domain.tld/">Home</a>
</li>
<li>
<a data-uv-trigger='true' href='mailto:john@doe.com'>
I need to pick the domain (subdomain.domain.tld) and the email-adress (john@doe.com). I can (wget / sed) the parts in two steps.
wget -O - -i urls-to-scan-manuell.txt | sed -n "s/\(.*a href=\"\)\(.*\)\(\">Home.*\)/\2/p"
wget -O - -i urls-to-scan-manuell.txt | sed -n "s/\(.*true' href='mailto\)\(.*\)\('>.*\)/\2/p"
But I would like to pick both parts at once and write them out to a file in one line, separated by a blank (space). It is the multiline thing with sed that drives me nuts.
Please: I need your help, would you :)
Thank you in advance, Rainer.