-1

I've a txt file that contains a web page source code , i want to extract all links that contains "https://ANYTHING.amazonaws.com" in it to a new file.

The new file will contain:

https://test-ok.amazonaws.com
https://hhhhh.hhhh.amazonaws.com
https://anything.dd.dd.amazonaws.com

the links doesn't have to be in a specific tag or something, they can be anywhere in any tag!

Thanks!

nora
  • 25
  • 3

1 Answers1

0

You can use grep to search for a regex pattern with -o flag to print only the matching fragments and then redirect output to a new file.

In your case probably this one should work:

grep -o 'https://.*\.amazonaws\.com' sourcecode.html > newfile

Here you can find regular expression syntax cheatsheet.