1

I need a regex pattern for finding web page links in HTML. I need to find link which is on some word next to word "Download here:".

This is how HTML looks like of some examples:

Download here: <a href="/images/alex.jpg">AlexPicture</a>
Download here: <a href="/images/nat.jpg">NAT</a>
Download here: <a href="/images/dog/pat.jpg">Pat the dog</a>
Download here: <a href="/images/chuchu.jpg.jpg">ChuChu</a>

I need to get that link, but I am tottaly new at this and can't get this to work, so I need someone who knows Regex to help me out.

  • I'd highly recommend looking at a html scraping library like [beautifulsoup in python](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) to do this. [He's a good list from github for different languages](https://github.com/lorien/awesome-web-scraping). Rather than using regex to parse HTML, [you don't want to awaken cthulhu](http://stackoverflow.com/a/1732454/4689736). – thodic Feb 23 '17 at 15:21

1 Answers1

1

You can use the following regex (using positive lookahead & lookbehind) to find all the web-page links in HTML :

(?<=Download here:\s<a\shref=").*?(?=">)

see demo / explanation

m87
  • 4,445
  • 3
  • 16
  • 31