How to get all the links from a .php page?

Question

I've opened a .php page from a website with bunch of hyperlinks on it. I want to copy them (their URLs) into a .txt file. Of course, I could do that manually, but there are too many of them, so I would want to do it somehow automatically.

Before I would do it this way: I would look into the page source, that is, its HTML code, and then parse it with some small script written specially for that. But this one is a .php page and all the links are piped in from a database on the server, I guess, rather than from the source code. Anyway, they are not in the page's HTML code.

I wonder if that is still possible. I believe it should be possible - all the links are displayed on my screen, they are all click-able and working, there should some way of capturing them somehow.

You can use the same script to parse the links. Did you try that? — Amal Murali, Jan 03 '14 at 12:13
If they dont show in the source, then they are added by javascript, not php — Steve, Jan 03 '14 at 12:13
Have you tried [preg_match_all](http://php.net/preg_match_all) ? — ʰᵈˑ, Jan 03 '14 at 12:14
Maybe you'll find what you're looking for here http://stackoverflow.com/questions/34120/html-scraping-in-php? — Christian Dechery, Jan 03 '14 at 12:15
using `file_get_contents()` you can also do it by the same script — Alireza Fallah, Jan 03 '14 at 12:15
@HarryDenley - "Have you tried preg_match_all ?" - NO, I haven't. What should I do in order to start using it? Is it a special software or a programming language that I need to install first? How? — brilliant, Jan 03 '14 at 12:23
@user574632 - "...they are added by javascript, not php" - Thanks, I didn't know that. — brilliant, Jan 03 '14 at 12:24
@brilliant im sure thats your issue, in which case none of these answers will help you. Can you confirm thats the case (that the links dont appear in the source), then i can help you — Steve, Jan 03 '14 at 12:25
@user574632 - "Can you confirm thats the case" - I am confirming: they are NOT in the page's HTML source. — brilliant, Jan 03 '14 at 12:27
@brilliant ok, the you need a js enabled dom parser to get the info - what programming languages are you familiar with? EDIT if this is only for a few pages Dharmesh's answer is ideal — Steve, Jan 03 '14 at 12:28
@user574632 - I am only familiar with AHK (www.autohotkey.com) — brilliant, Jan 03 '14 at 12:31
@user574632 "if this is only for a few pages Dharmesh's answer is ideal" - Thank you. I am trying to use his way now. — brilliant, Jan 03 '14 at 12:43
@user574632 - Thank you! Dharmesh's code did it so well indeed! — brilliant, Jan 03 '14 at 12:44

score 3 · Accepted Answer · answered Jan 03 '14 at 12:28

3

What I understand is you want to do this from browser itself: in that case use chrome open debug panel (press F12) and got to console tab and paste following code and press enter, and then copy the list of links from console and put in txt file.

var tags = document.getElementsByTagName("a");
for(var i=0;i<tags.length;i++) {
    console.log(tags[i].getAttribute("href"));
}

answered Jan 03 '14 at 12:28

Dharmesh Patel

1,881
1
11
12

1

Make sure your console is filtered to all, and not debug. [See Image - Chrome](http://i.imgur.com/kxEil4x.png) – ʰᵈˑ Jan 03 '14 at 12:35
WOW!!! It worked just like that! Thank you. Can you, please, tell me what language is your code written in? – brilliant Jan 03 '14 at 12:42
it's simple Javascript :) – Dharmesh Patel Jan 03 '14 at 12:43
Ah! I see. I didn't know that Chrome accepts Javascript. Thanks again! – brilliant Jan 03 '14 at 12:45
@HarryDenley - Thank you! Do you know any resourse on the internet where I could learn how to use that console with Javascript? – brilliant Jan 03 '14 at 13:03
I don't have any link to learn how to use console but if you know Javascript you can write any Javascript code in console and it will execute it. – Dharmesh Patel Jan 03 '14 at 13:07
@DharmeshPatel - Oh, I see, thank you. Guess it's about time I started learning Java - have already stumbled upon many cases when it's proved to be quite useful. – brilliant Jan 03 '14 at 13:13
Java and Javascript are totally different. I think what you meant, from what's supplied in this thread, is that you think it's time to learn JavaScript. – ʰᵈˑ Jan 03 '14 at 13:32
@HarryDenley - Oh my! I didn't know they were two different things. Thanks for letting me know! – brilliant Jan 04 '14 at 05:11

ddoor · Answer 2 · 2014-01-03T12:20:11.623

What you need to do.

Use php's CURL library to get the page as a string. Or better yet use file_get_contents

http://au1.php.net/file_get_contents

$homepage = file_get_contents('http://www.example.com/');

Use the DomDocument library to build a html document. http://au1.php.net/domdocument

$doc = new DOMDocument();
$doc->loadHTML($homepage);

From here you can get all the <a> tags in the html and get the href elements. By Calling $elements = $doc->getElementsByTagName("a");

Then just iterate over the elements getting the href out.

foreach($elements as $el) {
    $link = $el->getAttribute("href");
    echo $link . "\n";
}
//untested code

You can then re-use the script on any page, just change the curl request.

How to get all the links from a .php page?

2 Answers2