How to extract a pattern from a file and append to another on linux

Question

I've a txt file that contains a web page source code , i want to extract all links that contains "https://ANYTHING.amazonaws.com" in it to a new file.

The new file will contain:

https://test-ok.amazonaws.com
https://hhhhh.hhhh.amazonaws.com
https://anything.dd.dd.amazonaws.com

the links doesn't have to be in a specific tag or something, they can be anywhere in any tag!

Thanks!

score 0 · Answer 1 · answered Sep 03 '22 at 14:44

You can use grep to search for a regex pattern with -o flag to print only the matching fragments and then redirect output to a new file.

In your case probably this one should work:

grep -o 'https://.*\.amazonaws\.com' sourcecode.html > newfile

Here you can find regular expression syntax cheatsheet.

How to extract a pattern from a file and append to another on linux

1 Answers1