0

I was trying to match this pattern in regex 101

<a href="http://google.com">Google.com</a>
<A target="_blank" href='http://example.com/files.html'>An Example</A>
<a id="link23" HREF = "file23.html" target="_TOP">File #23</a>
<a href="images/mypic.png">See my picture!</a>
<a href="mailto:joelross@uw.edu">Email Joel</a>

and I made this regex- <[aA].\s(HREF|href)\s?=\s?('|").('|")>.*</[aA]>

now when I am trying to use the grep command via my command line,it throws me an error.

./mdlinks.sh: line 3: unexpected EOF while looking for matching `"'
./mdlinks.sh: line 4: syntax error: unexpected end of file

Here is the source file

#! /usr/bin/env bash
CONTENT=$(curl $1)
echo "$CONTENT" | grep -E -o '<[aA].*\s(HREF|href)\s?=\s?('|").*('|")>.*<\/[aA]>' >> mdlinks.txt
Aakash Tiwari
  • 45
  • 1
  • 5

1 Answers1

1

You need to escape the single quotes in the regex, and also your shebang has an extra space (although that's just style):

#!/usr/bin/env bash
CONTENT=$(curl $1)
echo "$CONTENT" | grep -E -o '<[aA].*\s(HREF|href)\s?=\s?('\''|").*('\''|")>.*<\/[aA]>' >> mdlinks.txt

It might be worth using double quotes for the regex, rather than single quotes. You'll still have to escape the double quotes inside the expression, but escaping double quotes is a little cleaner:

#!/usr/bin/env bash
CONTENT=$(curl $1)
echo "$CONTENT" | grep -E -o "<[aA].*\s(HREF|href)\s?=\s?('|\").*('|\")>.*<\/[aA]>" >> mdlinks.txt
user2926055
  • 1,963
  • 11
  • 10
  • Thank you so much for the reply. But I still am facing problems, the mdlinks file is just matching 1 anchor tag and not all the anchor tags present in the file – Aakash Tiwari Apr 11 '16 at 17:55
  • That's a problem with your regex. Try using non-greedy matches (`*?`) instead of greedy matches (`*`, which is the default behavior). – user2926055 Apr 11 '16 at 18:18