processing a string to get a download url

Question

i want to get the download url from https://www.java.com/de/download/manual.jsp for the offline installer x86 and x64 as a string. how can i do this?

i could get the page with file_get_contents();

$page = file_get_contents('https://www.java.com/de/download/manual.jsp');

which functions do i need to process the string?

i need this part of the source code:

<a title="Download der Java-Software für Windows Offline" href="http://javadl.sun.com/webapps/download/AutoDL?BundleId=113217">
Windows Offline</a>

and

<a title="Download der Java-Software für Windows (64-Bit)" href="http://javadl.sun.com/webapps/download/AutoDL?BundleId=113219">
Windows Offline (64-Bit)</a>

the problem is that the url might change after a version release.

Possible duplicate of [How do you parse and process HTML/XML in PHP?](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) — Oguz Ozgul, Nov 26 '15 at 23:05

score 0 · Answer 1 · answered Nov 26 '15 at 22:19

Preg_match will do the trick.

preg_match("'<a title=\"Download der Java-Software für Windows Offline\" href=\"(.*?)\">(.*?)</a>'si", $source, $match);

For the 64 bit version it's similar.

preg_match("'<a title=\"Download der Java-Software für Windows \(64-Bit\)\" href=\"(.*?)\">(.*?)</a>'si", $source, $match);

match[1], in both instances, will give the download links. These patterns rely on the text in the "title" attribute, so if that doesn't change and the download links so, it wont be a problem.

thank you very much. it worked for me. very interesting and useful function. — , Nov 28 '15 at 16:54

score 0 · Answer 2 · answered Nov 28 '15 at 16:56

$page = file_get_contents('https://www.java.com/de/download/manual.jsp');

preg_match("'<a title=\"Download der Java-Software für Windows Offline\" href=\"(.*?)\">(.*?)</a>'si", $page, $match);
preg_match("'<a title=\"Download der Java-Software für Windows \(64-Bit\)\" href=\"(.*?)\">(.*?)</a>'si", $page, $match1);

$d_x86 = $match[0];
$d_x64 = $match1[0];

preg_match("'http*://\w+.\w+.\w+/\w+/\w+/\w+.\w+=\d+'", $d_x86, $match3);
preg_match("'http*://\w+.\w+.\w+/\w+/\w+/\w+.\w+=\d+'", $d_x64, $match4);

$d_x86_url = $match3[0];
$d_x64_url = $match4[0];

echo "<a href=\"$d_x86_url\">Download aktuellste JRE für Windows x86</a><br>";
echo "<a href=\"$d_x64_url\">Download aktuellste JRE für Windows x64</a>";

score 0 · Answer 3 · answered Nov 28 '15 at 17:21

I suggest you to use beautiful PHP DOM extension to access all required nodes and attributes within your HTML document:

<?php

$dom = new DOMDocument();
$dom->loadHTMLFile('https://www.java.com/de/download/manual.jsp');//load and parse document

$links = $dom->getElementsByTagName('a');//get all 'a' tags in document
foreach ($links as $link) {//iterate on all 'a' tags
    if($link->getAttribute('title') == 'Download der Java-Software für Windows Offline')
    {
        echo $link->nodeValue . '<br/>';//or do whatever you want
    }
}

?>

processing a string to get a download url

3 Answers3