I am using Ubuntu 10.10 and using Grep to process some HTML files.
Here is the HTML snippet:
<a href="video.php?video=one-hd.mov"><img src="/1.jpg"><a href="video.php?video=normal.mov"><img src="/2.jpg"><a href="video.php?video=another-hd.mov">
I would like to extract one-hd.mov
and another-hd.mov
but ignore normal.mov
.
Here is my code:
example='<a href="video.php?video=one-hd.mov"><img src="/1.jpg"><a href="video.php?video=normal.mov"><img src="/2.jpg"><a href="video.php?video=another-hd.mov">'
echo $example | grep -Po '(?<=video.php\?video=).*?(?=-hd.mov">)'
The result is:
one
normal.mov"><img src="/2.jpg"><a href="video.php?video=another
But I want
one
another
There is a mismatch there.
Is this because of the so-called Greedy Regular Expression?
I am sing GREP but any command line bash tools are welcome to solve this problem like sed etc.
Thanks a lot.