1

I'm trying to get all URLs from a page using jQuery to call them later on using $.get(). If they were on the same page as the script is included in, it would be no problem calling something like

var links = document.getElementsByTagName("a");
for(var i=0; i<links.length; i++) {
    alert(links[i].href);
}

In this case I'd just use alert to check that the links were actually parsed. But how can I do the same thing with an URL that is not the current page? Any help would be appreciated. Maybe I'm missing something ridiculously simple but I am really stumped when it comes to anything JavaScript/JQuery related.

skulpt
  • 527
  • 2
  • 6
  • 25
  • If not in current page mean not in document model – brk May 22 '17 at 15:10
  • 1
    To access the content of a page on a different domain that page must be written to allow you to do so, its not possible (in the client) by default (Same Origin Policy) – Alex K. May 22 '17 at 15:10
  • Assuming it does, how would I go about that? – skulpt May 22 '17 at 15:11
  • 1
    You would have to 1. `$.get()` the other page 2. use an HTML parser to parse the HTML source into a DOM-object 3. search that for links –  May 22 '17 at 15:11
  • Given that an arbitrary URL won't allow you to, see [jquery .load() page then parse html](https://stackoverflow.com/questions/3856590/jquery-load-page-then-parse-html) – Alex K. May 22 '17 at 15:13

3 Answers3

2

Blatantly copying this answer by Nick Craver (go upvote it), but modifying it for your use case:

$.get("page.html", function(data) {
  var data = $(data);
  var links = data.find('a');
  //do stuff with links
});

Note that this will only work if the page you're hitting is set up for cross-origin request. If it isn't, you'll need to do the same with a Dom-parser from a backend server. Nodejs has some great options there, including jsDom.

SethWhite
  • 1,891
  • 18
  • 24
1

You will have to get the other page via an HTTP request ($.get in JQuery achieves this), and then either go about converting that HTML into a DOM that JQuery can then traverse and find the <a> tags for you, or use another method such as a regular expression to find all the links within the returned markup.

edit: Probably don't actually use a regex unless you have a guaranteed HTML format and can guarantee the format of all <a> tags on the page. By this point, it's probably just easier to parse the HTML for real.

varbrad
  • 474
  • 3
  • 11
  • Please don not parse HTML with a regex! https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not – Pevara May 22 '17 at 15:19
  • It can be done if you can guarantee the markup format, but you are right probably just easier to parse the HTML for real and go from there. – varbrad May 22 '17 at 15:31
0

Collect the current page URL using window.location.href and then match the same with the href of other "a" tags in the loop

var links = document.getElementsByTagName("a");
var thisHref = window.location.href;
for(var i=0; i<links.length; i++) {
    templink = links[i].href;
    if (templink != thisHref){// if the link is not same with current page URL
        alert(links[i].href);
    }
}
Sinha
  • 512
  • 4
  • 11