0

There are many posts in stackoverflow on using regular expressions for links. I tried one which did return links of the form: https://www.soup.com/myJscriptFile.js (and others) but it also returned css styles and other unwanted results. And it did not return links of the form

a href="subfolder/file.js"

So I tried my own regular expression as follows:

/<a href=.*\>

I thought that I could use ordinary string functions to strip off the 'href' and the brackets. But this does not work. I tried a question mark after the asterix to prevent 'greedy' behavior and that did not work. I am out of my depth. Any help is appreciated.

  • 1
    What is your input exactly and what do you want the result to be? And for what regex stack (javascript, php, .net, LISP) and see http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean – rene Jun 22 '14 at 18:29
  • If you are struggling with regex, search online for this first (which is I think what's expected on this site): http://www.mkyong.com/regular-expressions/how-to-extract-html-links-with-regular-expression/ http://social.msdn.microsoft.com/Forums/en-US/353a7c05-212e-45a0-84c4-8fc0ab8fac2a/regex-to-pull-out-all-html-anchor-tags-href-plus-the-link-text?forum=regexp – Omer Iqbal Jun 22 '14 at 18:31
  • I'm using visual basic dot net's regexp feature, and I have indeed looked at a regular expression tutorial before posting here. I read in the html of a web page, and within that, I try to find all links to PDFs, JavaScript files, and optionally other types such as CSS files. – user3765253 Jun 22 '14 at 19:50
  • Using regex on html is kind of nightmarish, not impossible if you're willing to overlook all that goes with real html parsing. –  Jun 22 '14 at 22:54

0 Answers0