-1

I've opened a .php page from a website with bunch of hyperlinks on it. I want to copy them (their URLs) into a .txt file. Of course, I could do that manually, but there are too many of them, so I would want to do it somehow automatically.

Before I would do it this way: I would look into the page source, that is, its HTML code, and then parse it with some small script written specially for that. But this one is a .php page and all the links are piped in from a database on the server, I guess, rather than from the source code. Anyway, they are not in the page's HTML code.

I wonder if that is still possible. I believe it should be possible - all the links are displayed on my screen, they are all click-able and working, there should some way of capturing them somehow.

brilliant
  • 2,805
  • 11
  • 39
  • 57

2 Answers2

3

What I understand is you want to do this from browser itself: in that case use chrome open debug panel (press F12) and got to console tab and paste following code and press enter, and then copy the list of links from console and put in txt file.

var tags = document.getElementsByTagName("a");
for(var i=0;i<tags.length;i++) {
    console.log(tags[i].getAttribute("href"));
}
Dharmesh Patel
  • 1,881
  • 1
  • 11
  • 12
  • 1
    Make sure your console is filtered to all, and not debug. [See Image - Chrome](http://i.imgur.com/kxEil4x.png) – ʰᵈˑ Jan 03 '14 at 12:35
  • WOW!!! It worked just like that! Thank you. Can you, please, tell me what language is your code written in? – brilliant Jan 03 '14 at 12:42
  • it's simple Javascript :) – Dharmesh Patel Jan 03 '14 at 12:43
  • Ah! I see. I didn't know that Chrome accepts Javascript. Thanks again! – brilliant Jan 03 '14 at 12:45
  • @HarryDenley - Thank you! Do you know any resourse on the internet where I could learn how to use that console with Javascript? – brilliant Jan 03 '14 at 13:03
  • I don't have any link to learn how to use console but if you know Javascript you can write any Javascript code in console and it will execute it. – Dharmesh Patel Jan 03 '14 at 13:07
  • @DharmeshPatel - Oh, I see, thank you. Guess it's about time I started learning Java - have already stumbled upon many cases when it's proved to be quite useful. – brilliant Jan 03 '14 at 13:13
  • Java and Javascript are totally different. I think what you meant, from what's supplied in this thread, is that you think it's time to learn JavaScript. – ʰᵈˑ Jan 03 '14 at 13:32
  • @HarryDenley - Oh my! I didn't know they were two different things. Thanks for letting me know! – brilliant Jan 04 '14 at 05:11
0

What you need to do.

Use php's CURL library to get the page as a string. Or better yet use file_get_contents

http://au1.php.net/file_get_contents

$homepage = file_get_contents('http://www.example.com/');

Use the DomDocument library to build a html document. http://au1.php.net/domdocument

$doc = new DOMDocument();
$doc->loadHTML($homepage);

From here you can get all the <a> tags in the html and get the href elements. By Calling $elements = $doc->getElementsByTagName("a");

Then just iterate over the elements getting the href out.

foreach($elements as $el) {
    $link = $el->getAttribute("href");
    echo $link . "\n";
}
//untested code

You can then re-use the script on any page, just change the curl request.

ddoor
  • 5,819
  • 9
  • 34
  • 41