1

I'm scraping page titles of website url's, this works fine but I'm experiencing an issue with pages that involve any type of redirect.

Instead of saving the outcome page title, I get the redirect page title.

Is there a way to check if the page is a redirect, and if so wait until the target page has loaded to scrape the page title?

I'm using the file_get_html library, my code currently looks like this:

        //If the user is logged in, use the file_get_html library to scrape the page name.
        if(!$query)
        {
            if (Auth::check())
            {
                $html = file_get_html($long_url);
                $titleraw = $html->find('title',0);
                $title = $titleraw->innertext;
                $link->page_title = $title;
            }
            //Saves the built object to the database.
            $link->save();
            DB::table('users_links')->insert(array('link_id' => $link->id, 'user_id' => $link->users_id, 'privacy' => 0));
        }
samayres1992
  • 779
  • 2
  • 12
  • 30
  • If the page uses http redirects, you can use `curl` and set the [`CURLOPT_FOLLOWLOCATION`](http://php.net/curl_setopt) option – kero Jul 19 '14 at 13:14
  • If you're not up for cURL, you can try [follow_location](http://stackoverflow.com/a/12566320) for file_get_contents. – Dave Chen Jul 19 '14 at 13:16
  • 2
    You can use the Guzzle library which has [built-in support](http://guzzle3.readthedocs.org/http-client/http-redirects.html) for following HTTP redirects. –  Jul 19 '14 at 18:46

0 Answers0