1

I would to like to get contents from each url in a list using fread and Fibers where each stream does not need to wait a feof to run another fread in another url

My current code is the follow:

<?php

function getFiberFromStream($stream, $url): Fiber {
    
    return new Fiber(function ($stream) use ($url): void {
                while (!feof($stream)) {
                    echo "reading 100 bytes from $url".PHP_EOL;
                    $contents = fread($stream, 100);
                    Fiber::suspend($contents);
                }
            });
}

function getContents(array $urls): array {

    $contents = [];

    foreach ($urls as $key => $url) {

        $stream = fopen($url, 'r');
        stream_set_blocking($stream, false);
        $fiber = getFiberFromStream($stream, $url);
        $content = $fiber->start($stream);

        while (!$fiber->isTerminated()) {
            $content .= $fiber->resume();
        }
        fclose($stream);

        $contents[$urls[$key]] = $content;
    }

    return $contents;
}

$urls = [
    'https://www.google.com/',
    'https://www.twitter.com',
    'https://www.facebook.com'
];

var_dump(getContents($urls));

Unfortunatelly, the echo used in getFiberFromStream() are showing that this current code is waiting to get the entire content from a url to go to next one:

reading 100 bytes from https://www.google.com
reading 100 bytes from https://www.google.com
reading 100 bytes from https://www.google.com //finished
reading 100 bytes from https://www.twitter.com
reading 100 bytes from https://www.twitter.com
reading 100 bytes from https://www.twitter.com //finished
reading 100 bytes from https://www.facebook.com
[...]

I would like something like:

reading 100 bytes from https://www.google.com
reading 100 bytes from https://www.twitter.com
reading 100 bytes from https://www.facebook.com
reading 100 bytes from https://www.google.com
reading 100 bytes from https://www.twitter.com
reading 100 bytes from https://www.facebook.com
[...]
celsowm
  • 846
  • 9
  • 34
  • 59
  • Are you aware you can use curl to execute [parallel requests](https://stackoverflow.com/questions/54003466/get-all-the-urls-using-multi-curl)? – Olivier Mar 30 '22 at 07:19
  • Yes, thanks but I would like to use Fibers for studying purposes – celsowm Mar 30 '22 at 11:34
  • your code is waiting for the fiber to complete before starting the next so yes it will run sequencially so you need to break the starting and the waiting so that there are multiple running before you wait – MikeT Mar 30 '22 at 15:02

1 Answers1

2

The behaviour you see is because you poll the current fiber till full completion before go onto next fiber.

Solution here is to start all fibers for all urls at once and only after that do poll them.

Try something like this:


function getContents(array $urls): array {

    $contents = [];
    $fibers = [];

    // start them all up
    foreach ($urls as $key => $url) {

        $stream = fopen($url, 'r');
        stream_set_blocking($stream, false);
        $fiber = getFiberFromStream($stream, $url);
        $content = $fiber->start($stream);

        // save fiber context so we can process them later
        $fibers[$key] = [$fiber, $content, $stream];
    }

    // now poll
    $have_unterminated_fibers = true;
    while ($have_unterminated_fibers) {

        // first suppose we have no work to do
        $have_unterminated_fibers = false;

        // now loop over fibers to see if any is still working
        foreach ($fibers as $key => $item) {
            // fetch context
            $fiber = $item[0]; 
            $content = $item[1]; 
            $stream = $item[2];

            // don't do while till the end here, 
            // just process next chunk
            if (!$fiber->isTerminated()) {
                // yep, mark we still have some work left
                $have_unterminated_fibers = true;

                // update content in the context
                $content .= $fiber->resume();
                $fibers[$key][1] = $content;
            } else {
                if ($stream) {
                    fclose($stream);

                    // save result for return
                    $contents[$urls[$key]] = $content;

                    // mark stream as closed in context 
                    // so it don't close twice
                    $fibers[$key][2] = null;
                }
            }
        }
    }

    return $contents;
}

shomeax
  • 855
  • 6
  • 12