0

I'm creating a website that scrapes Google search results with the PHP file_get_contents function. I've asked it here already, and they told me that I should load the page after it's fully loaded, but how should I do this?

My problem is that I want to read out the results, and if I go to google.com every title is a H3. But when I'm loading it in, every title has an unique class.

My code

<?php

require 'simple_html_dom.php';

echo '
    <link rel="stylesheet" href="search.css" />
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" />
    <link rel="shortcut icon" type="image/png" href="favicon.png" />
    <body><div class="container">
';

$query = $_GET['q'];
if($query == '') {
    echo '<script type="text/javascript">window.location.href="index.html";</script>';
}

echo '<title>'.$query.' | SearchAda</title>';

echo '
    <form action="search.php" method="get">
        <a href="index.html"><h1 class="brand">SearchAda</h1></a>
        <div class="input-group">
            <input type="text" name="q" value="'.$query.'" placeholder="Typ uw zoekopdracht..." />
            <i class="fa fa-search"></i>
        </div>
    </form>
';

$url = 'https://www.google.com/search?q='.str_replace(' ','+',$query);

$doc = file_get_html($url);
echo $doc;

?>

Some screenshots - My Search Engine, SearchAda - Google's results

2 Answers2

0

If you are just downloading a website's source and attempting to display it you will have problems. All relative resources (<link rel="/..."> <script src="/..." and images) will need to either be downloaded or modified to directly use the original resource (You may run into access problems with this). This will also give problems with some scripts and CORS on many websites.

It looks like what you are looking for is an HTML Renderer to process the website and give you the true result. Just downloading the page and assets are not enough, they will need some basic processing (see also web crawlers/spiders).

How Browsers Work: Behind the scenes of modern web browsers

Wolf
  • 52
  • 3
  • Hi, thanks for your answer! But how should I do this then? I understand it isn't an easy way, but there must be a way. Have you any suggestions how I could do this? –  Dec 19 '19 at 18:27
  • @JeroenvanRensen Try starting with the basic php web crawler example: https://stackoverflow.com/a/2313270/9142698 and see if that does what you need. For Google that should be enough, however if you need to process dynamic websites you will need to set use an html rendering engine. When I needed one I used PhantomJS https://phantomjs.org/ Edit: If you need an inbetween solution here is a feature full PHP web crawler I keep finding: https://github.com/spatie/crawler – Wolf Dec 19 '19 at 18:32
  • Hi, I've tried you first link. I created this code, and unfortutately it also didn't work. It returned a blank web page. ` loadHTMLFile($url); $headings = $dom->getElementsByTagName('h3'); foreach ($headings as $content) { $content = $element->nodeValue; echo $content; } } crawl_page("http://google.com/search?q=test"); ` –  Dec 19 '19 at 18:41
0

Looks like you are trying to create a google search box for your site. if so! I recommend to have a google search here: https://developers.google.com/custom-search/docs/tutorial/introduction

If you want to use your own code (which is hard for you to get complete solution) :

I will give you a working example (not a complete solution).

Create a file named search_result.php and paste following code in php tags.

require 'simple_html_dom.php';// I am not sure what this file is for :)

    $str = $_POST["q"]; 

      $url = "https://www.google.com/search?q=".str_replace(' ','+',$str);

    $result = file_get_contents($url);

    echo $result;

AND Create a file named search.php and paste following code in:

 <form action="search_result.php" method="post">
  <input type="text" name="q" class="field" id="keyword" placeholder="Aranacak kelime..." required />
  <input type="submit" name="submit_search" class="search-btn" value="" />
 </form>

NOTE: This is a working example but not a complete solution

.

Good luck