How can I execute this grep command

Question

I was trying to match this pattern in regex 101

<a href="http://google.com">Google.com</a>
<A target="_blank" href='http://example.com/files.html'>An Example</A>
<a id="link23" HREF = "file23.html" target="_TOP">File #23</a>
<a href="images/mypic.png">See my picture!</a>
<a href="mailto:joelross@uw.edu">Email Joel</a>

and I made this regex- <[aA].\s(HREF|href)\s?=\s?('|").('|")>.*</[aA]>

now when I am trying to use the grep command via my command line,it throws me an error.

./mdlinks.sh: line 3: unexpected EOF while looking for matching `"'
./mdlinks.sh: line 4: syntax error: unexpected end of file

Here is the source file

#! /usr/bin/env bash
CONTENT=$(curl $1)
echo "$CONTENT" | grep -E -o '<[aA].*\s(HREF|href)\s?=\s?('|").*('|")>.*<\/[aA]>' >> mdlinks.txt

http://stackoverflow.com/questions/1881237/easiest-way-to-extract-the-urls-from-an-html-page-using-sed-or-awk-only — MattSizzle, Apr 11 '16 at 17:26
use xmllint with an xpath query: http://xmlsoft.org/xmllint.html — Casimir et Hippolyte, Apr 11 '16 at 18:47

score 1 · Accepted Answer · answered Apr 11 '16 at 17:30

1

You need to escape the single quotes in the regex, and also your shebang has an extra space (although that's just style):

#!/usr/bin/env bash
CONTENT=$(curl $1)
echo "$CONTENT" | grep -E -o '<[aA].*\s(HREF|href)\s?=\s?('\''|").*('\''|")>.*<\/[aA]>' >> mdlinks.txt

It might be worth using double quotes for the regex, rather than single quotes. You'll still have to escape the double quotes inside the expression, but escaping double quotes is a little cleaner:

#!/usr/bin/env bash
CONTENT=$(curl $1)
echo "$CONTENT" | grep -E -o "<[aA].*\s(HREF|href)\s?=\s?('|\").*('|\")>.*<\/[aA]>" >> mdlinks.txt

answered Apr 11 '16 at 17:30

user2926055

1,963
11
10

Thank you so much for the reply. But I still am facing problems, the mdlinks file is just matching 1 anchor tag and not all the anchor tags present in the file – Aakash Tiwari Apr 11 '16 at 17:55
That's a problem with your regex. Try using non-greedy matches (`*?`) instead of greedy matches (`*`, which is the default behavior). – user2926055 Apr 11 '16 at 18:18

How can I execute this grep command

1 Answers1