1

I'm hoping you can help me optimise and speed up a script that I have written to retrieve data for multiple sources. Right now the script tends to take between 2 - 10 minutes to run depending on the time of day.

All and any help would be very much apprectiated.

Explaining the script:
I use a foreach loop to run through five different URLs and each URL will then have five POST requests performed with the resulting data being written to a file.

What I'm hoping:
To improve the speed of the script I was hoping that it would be possible to have each of the five URLs within separate functions and to then run those five functions parallel to each other.

One of the options I've read about is pcntl_fork, could this be used to achieve what I need? If not, are there any other options?

My script:

foreach($urls as $url){
        $firmato = file_get_html($url, false, stream_context_create($firmato_request));
        if (preg_match('(record totali ([\d]+))', $firmato, $count)) {
            $firmato_count = $count[1];
        };
        $inviato = file_get_html($url, false, stream_context_create($inviato_request));
        if (preg_match('(record totali ([\d]+))', $inviato, $count)) {
            $inviato_count = $count[1];
        };
        $positive = file_get_html($url, false, stream_context_create($positive_request));
        if (preg_match('(record totali ([\d]+))', $positive, $count)) {
            $positive_count = $count[1];
        };
        $negative = file_get_html($url, false, stream_context_create($negative_request));
        if (preg_match('(record totali ([\d]+))', $negative, $count)) {
            $negative_count = $count[1];
        };
        $total = file_get_html($url, false, stream_context_create($default_request));
        if (preg_match('(record totali ([\d]+))', $total, $count)) {
            $default_count = $count[1];
            $other_count = $firmato_count+$inviato_count+$positive_count+$negative_count-$default_count;
        };

        if ($url == 'http://sourceOne.com/MessageServlet') {
            $cacheDir = 'cache/one/';
        } elseif ($url == 'http://sourceTwo.com/MessageServlet') {
            $cacheDir = 'cache/two/';
        } elseif ($url == 'http://sourceThree.com/MessageServlet') {
            $cacheDir = 'cache/three/';
        } elseif ($url == 'http://sourceFour.com/MessageServlet') {
            $cacheDir = 'cache/four/';
        } elseif ($url == 'http://sourceFive.com/MessageServlet') {
            $cacheDir = 'cache/five/';
        }
        $cache_file = $cacheDir.'hour_'.sprintf('%02d', $previousHour).'.txt';
        $data = '<tr><td>'.sprintf('%02d', $previousHour).':00</td><td>'.$firmato_count.'</td><td>'.$inviato_count.'</td><td>'.$positive_count.'</td><td>'.$negative_count.'</td><td>'.$other_count.'</tr>';
        file_put_contents($cache_file, $data);
};
  • You will find some good answers about multithreading in PHP here: http://stackoverflow.com/questions/70855/how-can-one-use-multi-threading-in-php-applications – Oliver Feb 21 '17 at 12:03
  • In fact, there are too many possible answers for your question. E.G. you could use curl, which is a bit faster than file_get_contents, And it is even more faster, but harder to implement, to create a bash-script on your Linux-Mashine, which will be only called by PHP. – Oliver Feb 21 '17 at 12:07
  • http://codereview.stackexchange.com/help/on-topic – mickmackusa Feb 21 '17 at 12:24

1 Answers1

1

You can use curl_multi http://php.net/manual/en/function.curl-multi-init.php or you can use already written libraries like https://github.com/jmathai/php-multi-curl but those libraries are written with curl_multi anyway.

Eimsas
  • 492
  • 6
  • 21