3

[Updated At Bottom]
Hi everyone.

Start With Short URLs:
Imagine that you've got a collection of 5 short urls (like http://bit.ly) in a php array, like this:

$shortUrlArray = array("http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123");

End with Final, Redirected URLs:
How can I get the final url of these short urls with php? Like this:

http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html

I have one method (found online) that works well with a single url, but when looping over multiple urls, it only works with the final url in the array. For your reference, the method is this:

function get_web_page( $url ) 
{ 
    $options = array( 
        CURLOPT_RETURNTRANSFER => true,     // return web page 
        CURLOPT_HEADER         => true,    // return headers 
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects 
        CURLOPT_ENCODING       => "",       // handle all encodings 
        CURLOPT_USERAGENT      => "spider", // who am i 
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect 
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect 
        CURLOPT_TIMEOUT        => 120,      // timeout on response 
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects 
    ); 

    $ch      = curl_init( $url ); 
    curl_setopt_array( $ch, $options ); 
    $content = curl_exec( $ch ); 
    $err     = curl_errno( $ch ); 
    $errmsg  = curl_error( $ch ); 
    $header  = curl_getinfo( $ch ); 
    curl_close( $ch ); 

    //$header['errno']   = $err; 
    //$header['errmsg']  = $errmsg; 
    //$header['content'] = $content; 
    print($header[0]); 
    return $header; 
}  


//Using the above method in a for loop

$finalURLs = array();

$lineCount = count($shortUrlArray);

for($i = 0; $i <= $lineCount; $i++){

    $singleShortURL = $shortUrlArray[$i];

    $myUrlInfo = get_web_page( $singleShortURL ); 

    $rawURL = $myUrlInfo["url"];

    array_push($finalURLs, $rawURL);

}

Close, but not enough
This method works, but only with a single url. I Can't use it in a for loop which is what I want to do. When used in the above example in a for loop, the first four elements come back unchanged, and only the final element is converted into its final url. This happens whether your array is 5 elements or 500 elements long.

Solution Sought:
Please give me a hint as to how you'd modify this method to work when used inside of a for loop with collection of urls (Rather than just one).

-OR-

If you know of code that is better suited for this task, please include it in your answer.

Thanks in advance.

Update:
After some further prodding I've found that the problem lies not in the above method (which, after all, seems to work fine in for loops) but possibly encoding. When I hard-code an array of short urls, the loop works fine. But when I pass in a block of newline-seperated urls from an html form using GET or POST, the above mentioned problem ensues. Are the urls somehow being changed into a format not compatible with the method when I submit the form????

New Update:
You guys, I've found that my problem was due to something unrelated to the above method. My problem was that the URL encoding of my short urls converted what i thought were just newline characters (separating the urls) into this: %0D%0A which is a line feed or return character... And that all short urls save for the final url in the collection had a "ghost" character appended to the tail, thus making it impossible to retrieve the final urls for those only. I identified the ghost character, corrected my php explode, and all works fine now. Sorry and thanks.

m0rtimer
  • 2,023
  • 1
  • 25
  • 31

3 Answers3

2

I think you are almost have it there. Try this:

$shortUrlArray = array("http://yhoo.it/2deaFR",
    "http://bit.ly/900913",
    "http://bit.ly/4m1AUx");

    $finalURLs = array();

    $lineCount = count($shortUrlArray);

    for($i = 0; $i < $lineCount; $i++){
            $singleShortURL = $shortUrlArray[$i];
            $myUrlInfo = get_web_page( $singleShortURL );
            $rawURL = $myUrlInfo["url"];
             printf($rawURL."\n");
            array_push($finalURLs, $rawURL);
    }
ajpyles
  • 628
  • 3
  • 12
  • thanks for your answer. And sorry because it seems that both your method and mine DO work fine, when the url array is hard coded into the php script. My problem seems to arise when I use an html form (GET/POST) to pass in a list of short urls, explode them by their newline characters to create the array...A problem of encoding? – m0rtimer Feb 06 '11 at 04:01
2

This may be of some help: How to put string in array, split by new line?

You would probably do something like this, assuming you're getting the URLs returned in POST:

$final_urls = array();

$short_urls = explode( chr(10), $_POST['short_urls'] ); //You can replace chr(10) with "\n" or "\r\n", depending on how you get your urls. And of course, change $_POST['short_urls'] to the source of your string.

foreach ( $short_urls as $short ) {
    $final_urls[] = get_web_page( $short );
}

I get the following output, using var_dump($final_urls); and your bit.ly url:

http://codepad.org/8YhqlCo1

And my source: $_POST['short_urls'] = "http://bit.ly/123\nhttp://bit.ly/123\nhttp://bit.ly/123\nhttp://bit.ly/123";

I also got an error, using your function: Notice: Undefined offset: 0 in /var/www/test.php on line 27 Line 27: print($header[0]); I'm not sure what you wanted there...

Here's my test.php, if it will help: http://codepad.org/zI2wAOWL

Community
  • 1
  • 1
Micheal
  • 359
  • 4
  • 15
  • Hi Mike. Thanks for this. Before I saw your answer I traced the problem to the newline character actually being a carriage return character at the end of all urls but the last one in the collection, which was not being caught by my explode("\n", $urlBlock) code. Fixed that and now it works. – m0rtimer Feb 06 '11 at 05:44
  • Not a problem Eric. I'm glad you figured it out. One thing you can do is replace the carriage return character with nothing (or a newline character, if there doesn't appear to be any) before you explode on the newline character. – Micheal Feb 08 '11 at 22:26
0

I implemented to get a each line of a plain text file, with one shortened url per line, the according redirect url:

<?php
// input: textfile with one bitly shortened url per line
$plain_urls = file_get_contents('in.txt');
$bitly_urls = explode("\r\n", $plain_urls);

// output: where should we write
$w_out = fopen("out.csv", "a+") or die("Unable to open file!");

foreach($bitly_urls as $bitly_url) {
  $c = curl_init($bitly_url);
  curl_setopt($c, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36');
  curl_setopt($c, CURLOPT_FOLLOWLOCATION, 0);
  curl_setopt($c, CURLOPT_HEADER, 1);
  curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 20);
  // curl_setopt($c, CURLOPT_PROXY, 'localhost:9150');
  // curl_setopt($c, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
  $r = curl_exec($c);

  // get the redirect url:
  $redirect_url = curl_getinfo($c)['redirect_url'];

  // write output as csv
  $out = '"'.$bitly_url.'";"'.$redirect_url.'"'."\n";
  fwrite($w_out, $out);
}
fclose($w_out);

Have fun and enjoy! pw

p-w
  • 1