0

As you can see below, there is a controller structure that pulls various data from Instagram and calculates the engagement rate after pulling this data. Although this structure works fast in small-scale accounts, it becomes very slow and inefficient when it comes to large accounts. I tried to speed it up by trying various things, such as using the yield method, but since I'm new to php, I'm not even sure if I should use yield in this code. Could you please help me on what to do? Thanks in advance.

this is my controller code

<?php

namespace App\Http\Controllers;

use Illuminate\Support\Facades\Http;

class splenperAPIController extends Controller
{    
    public function splenperAPI()
    {
        $maxId = '';
        $response = Http::withHeaders([
            'cookie' => 'sessionid=1796659686%3A4Ojj1py72bZKql%3A7; csrftoken=fPYrPRD1vHB7LdS0DjKzOK4kGo4uYK9f; ds_user_id=1796659686; ig_did=25114CCD-7A9D-4971-88F1-1E04796D9F14; ig_nrcb=1; mid=YiqZZQALAAEhkff6N5T2oovGOBkz; rur=01f73b7ee19feca3df296fc45ee75179c49dd54efb93233307b58f19bf707fea5d458fe7; shbid=01f7998eadf528233f0e7331b327e5a107134084c64e229af2cc6577b8a2ce862e3798da; shbts=01f7caf607a6877ca8407e9cb85cd175736a87f8f75e9076d5bddb32cbcbac6d5da45e89',
            'x-ig-app-id' => '936619743392459',
            'Content-Type' => 'application/json',
        ])->get('https://www.instagram.com/elmaligroup/?__a=1');
        $response = $response->json();

        $userId = $response['graphql']['user']['id'];
        $followersCount = $response['graphql']['user']['edge_followed_by']['count'];
        $count = 12;

        $index = 0;
        $isMoreAvailable = true;

        $totalLikeCount = 0;
        $totalCommentCount = 0;

        while ($index < $count && $isMoreAvailable) {
            $variables = json_encode([
                'id' => $userId,
                "after" => $maxId,
                "first" => $count,
            ]);

            $variables = urlencode($variables);
            $response = Http::withHeaders([
                'cookie' => 'sessionid=1796659686%3A4Ojj1py72bZKql%3A7; csrftoken=fPYrPRD1vHB7LdS0DjKzOK4kGo4uYK9f; ds_user_id=1796659686; ig_did=25114CCD-7A9D-4971-88F1-1E04796D9F14; ig_nrcb=1; mid=YiqZZQALAAEhkff6N5T2oovGOBkz; rur=01f73b7ee19feca3df296fc45ee75179c49dd54efb93233307b58f19bf707fea5d458fe7; shbid=01f7998eadf528233f0e7331b327e5a107134084c64e229af2cc6577b8a2ce862e3798da; shbts=01f7caf607a6877ca8407e9cb85cd175736a87f8f75e9076d5bddb32cbcbac6d5da45e89',
                'x-ig-app-id' => '936619743392459',
                'Content-Type' => 'application/json',
            ])->get('https://www.instagram.com/graphql/query/?query_hash=e769aa130647d2354c40ea6a439bfc08&variables=' . $variables);
            
            
            
            for ($i = 0; $i < $count; $i++) {
                if ($i == count($response->json()['data']['user']['edge_owner_to_timeline_media']['edges'])) {
                    break;
                }
                $totalCommentCount += $response->json()['data']['user']['edge_owner_to_timeline_media']['edges'][$i]['node']['edge_media_to_comment']['count'];
                $totalLikeCount += $response->json()['data']['user']['edge_owner_to_timeline_media']['edges'][$i]['node']['edge_media_preview_like']['count'];
                $userName = $response->json()['data']['user']['edge_owner_to_timeline_media']['edges'][$i]['node']['owner']['username'];
                $index++;

            }

            $maxId = $response->json()['data']['user']['edge_owner_to_timeline_media']['page_info']['end_cursor'];
            $isMoreAvailable = $response->json()['data']['user']['edge_owner_to_timeline_media']['page_info']['has_next_page'];
            if ($isMoreAvailable) {
                $index = 0;
            }

        }
        $mediaCount = $response->json()['data']['user']['edge_owner_to_timeline_media']['count'];



        echo $followersCount . '<br>';
        echo $totalLikeCount . '<br>';
        echo $totalCommentCount . '<br>';
        echo ($totalLikeCount + $totalCommentCount) . '<br>';
        echo $userName . '<br>';
        echo $userId . '<br>';
        echo $mediaCount . '<br>';

        $percent = ($totalLikeCount + $totalCommentCount) / $followersCount * 100;
        echo "engagementRate: " . number_format($percent, 2, ',', '.') . '%';

    }
    
}
  • How long does this take to run? It can really only go as fast as Instagram can return data so, there may not be any meaningful way to speed it up more. You could put all of this code in a job that can just keep running to store stuff in your own database, which you could fetch faster on your site – GrumpyCrouton May 23 '22 at 20:29
  • It depends on the size of the account. For example, it takes about 1 hour for Cristiano Ronaldo. Even for the medium sized account, the time is not negligible – Yusuf Doğan May 23 '22 at 20:31
  • 1
    Maybe using async requests with guzzle could help. – TEFO May 23 '22 at 21:28
  • I'm assuming it is the `while` loop that is the slowest, but you are also concatenating a string which could introduce memory pressure eventually. But to start with and confirm, [time](https://stackoverflow.com/a/9288945/231316) each loop inside of the `while`, and possibly track your [overall memory growth](https://stackoverflow.com/a/16239377/231316). Then, consider putting each individual item in the `while` loop into a queue, and have a server-side task that processes the queue. One advantage of that is that you can have multiple task runners which should make it faster. – Chris Haas May 23 '22 at 21:29
  • no it didn't work @TEFO – Yusuf Doğan May 23 '22 at 21:32
  • Can I use the yield keyword? Because in this case, it's the most logical procedure to me. @ChrisHaas – Yusuf Doğan May 23 '22 at 21:35
  • You are requesting data from a URL, of course it's "slow". Let's say it runs with 10 requests per second (0.1s per request), which is fairly optimistic, you can fetch 600 data packets per minute, processing the data not included. For operations like this you must always calculate with the latency of the web. If parallel access is feasible is questionable, I can imagine that instagram doesn't like it when you hammer their API with dozens of simultaneous requests... – Honk der Hase May 23 '22 at 21:58
  • “Can I use the yield keyword?”. Okay, imagine instead of a `while` loop that you had a function called `getUrls()` which returned an array which you could `foreach` over. If creating that array was “expensive”, a `yield` might be helpful. But for you, 99% of your time is probably in network requests. – Chris Haas May 24 '22 at 02:56

2 Answers2

1

i've made a basic benchmark with microtime and memory_get_usage so after (or while if you can tail laravel log) script runs you will be able to see some stats about timings and memory and overal stats after script finish, then update question and we can think about next steps

public function splenperAPI()
  {
    $workStart = microtime(true);

    $headers = [
      'cookie' => 'sessionid=1796659686%3A4Ojj1py72bZKql%3A7; csrftoken=fPYrPRD1vHB7LdS0DjKzOK4kGo4uYK9f; ds_user_id=1796659686; ig_did=25114CCD-7A9D-4971-88F1-1E04796D9F14; ig_nrcb=1; mid=YiqZZQALAAEhkff6N5T2oovGOBkz; rur=01f73b7ee19feca3df296fc45ee75179c49dd54efb93233307b58f19bf707fea5d458fe7; shbid=01f7998eadf528233f0e7331b327e5a107134084c64e229af2cc6577b8a2ce862e3798da; shbts=01f7caf607a6877ca8407e9cb85cd175736a87f8f75e9076d5bddb32cbcbac6d5da45e89',
      'x-ig-app-id' => '936619743392459',
      'Content-Type' => 'application/json',
    ];
    $maxId = '';
    $response = Http::withHeaders($headers)->get('https://www.instagram.com/elmaligroup/?__a=1');
    $response = $response->json();

    $userId = $response['graphql']['user']['id'];
    $followersCount = $response['graphql']['user']['edge_followed_by']['count'];
    $count = 12;

    $index = 0;
    $isMoreAvailable = true;

    $totalLikeCount = 0;
    $totalCommentCount = 0;
    // debug vars
    $requestsCount = 0;
    $slowestRequestTime = 0;
    $totalRequestsTime = 0;
    $totalProcessingDataTime = 0;
    // debug vars end

    while ($index < $count && $isMoreAvailable) {
      $variables = urlencode(
        json_encode([
          'id' => $userId,
          "after" => $maxId,
          "first" => $count,
        ])
      );

      // we'll log requests durations
      $startTime = microtime(true);
      $response = Http::withHeaders($headers)
        ->get("https://www.instagram.com/graphql/query/?query_hash=e769aa130647d2354c40ea6a439bfc08&variables=$variables");
      $endTime = microtime(true);

      $requestsCount++;
      $requestTime = round($endTime - $startTime, 6);
      $totalRequestsTime += $requestTime;
      if ($requestTime > $slowestRequestTime) $slowestRequestTime = $requestTime;

      // also we'll log processing requested data duration 
      $startTime = microtime(true);
      for ($i = 0; $i < $count; $i++) {
        if ($i == count($response->json()['data']['user']['edge_owner_to_timeline_media']['edges'])) {
          break;
        }
        $totalCommentCount += $response->json()['data']['user']['edge_owner_to_timeline_media']['edges'][$i]['node']['edge_media_to_comment']['count'];
        $totalLikeCount += $response->json()['data']['user']['edge_owner_to_timeline_media']['edges'][$i]['node']['edge_media_preview_like']['count'];
        $userName = $response->json()['data']['user']['edge_owner_to_timeline_media']['edges'][$i]['node']['owner']['username'];
        $index++;
      }

      $maxId = $response->json()['data']['user']['edge_owner_to_timeline_media']['page_info']['end_cursor'];
      $isMoreAvailable = $response->json()['data']['user']['edge_owner_to_timeline_media']['page_info']['has_next_page'];
      if ($isMoreAvailable) {
        $index = 0;
      }
      $endTime = microtime(true);
      $processingDataTime = round($endTime - $startTime, 6);
      $totalProcessingDataTime += $processingDataTime;
      Log::debug('speed', [
        'request time' => $requestTime,
        'process data time' => $processingDataTime,
        'current memory usage (MB)' => memory_get_usage() / 1048576
      ]);
    }
    $workDuration = round(microtime(true) - $workStart, 6);
    // and the final log to get total data
    Log::debug('speed', [
      'total requests' => $requestsCount,
      'total network time' => $totalRequestsTime,
      'slowest' => $slowestRequestTime,
      'average' => round($totalRequestsTime / $requestsCount, 6),
      'total processing time' => $totalProcessingDataTime,
      'total work time' => $workDuration,
      'memory peak (MB)' =>  memory_get_peak_usage() / 1048576
    ]);
    $mediaCount = $response->json()['data']['user']['edge_owner_to_timeline_media']['count'];

    echo $followersCount . '<br>';
    echo $totalLikeCount . '<br>';
    echo $totalCommentCount . '<br>';
    echo ($totalLikeCount + $totalCommentCount) . '<br>';
    echo $userName . '<br>';
    echo $userId . '<br>';
    echo $mediaCount . '<br>';

    $percent = ($totalLikeCount + $totalCommentCount) / $followersCount * 100;
    echo "engagementRate: " . number_format($percent, 2, ',', '.') . '%';
  }
Ol D. Castor
  • 543
  • 4
  • 12
  • where can i see log details? – Yusuf Doğan May 24 '22 at 18:46
  • @YusufDoğan `project_folder/storage/logs` depends on your log config (either laravel-yyyy-mm-dd.log or laravel.log most common) in general case - look at most recently updated file – Ol D. Castor May 24 '22 at 20:45
  • **this is the result @OlD.Castor**[2022-05-25 00:08:55] local.DEBUG: speed {"total requests":23,"total network time":43.352072,"slowest":6.749108,"average":1.884873,"total processing time":0.038909,"total work time":46.302291,"memory peak (MB)":20.059410095214844} – Yusuf Doğan May 25 '22 at 00:13
  • I think the issue is instagram graphQL api. Because they limit the request amount per hour to 5000 and when I try to with big account api is not response enough – Yusuf Doğan May 25 '22 at 00:31
0

I believe the big problem here is the number of HTTP requests you are doing. I think you must increase the number per request. It will decrease HTTP requests and probably speedup your code.

You set it on $count = 12;

I refactor your code and put $maxPerPage = 50 and tried to decrease some unnecessary counting and call for method $response->json().

    $headers = [
        'cookie' => 'sessionid=1796659686%3A4Ojj1py72bZKql%3A7; csrftoken=fPYrPRD1vHB7LdS0DjKzOK4kGo4uYK9f; ds_user_id=1796659686; ig_did=25114CCD-7A9D-4971-88F1-1E04796D9F14; ig_nrcb=1; mid=YiqZZQALAAEhkff6N5T2oovGOBkz; rur=01f73b7ee19feca3df296fc45ee75179c49dd54efb93233307b58f19bf707fea5d458fe7; shbid=01f7998eadf528233f0e7331b327e5a107134084c64e229af2cc6577b8a2ce862e3798da; shbts=01f7caf607a6877ca8407e9cb85cd175736a87f8f75e9076d5bddb32cbcbac6d5da45e89',
        'x-ig-app-id' => '936619743392459',
        'Content-Type' => 'application/json',
    ];
    $response = Http::withHeaders($headers)->get('https://www.instagram.com/elmaligroup/?__a=1');
    $response = $response->json();

    $userId = $response['graphql']['user']['id'];
    $followersCount = $response['graphql']['user']['edge_followed_by']['count'];
    $maxPerPage = 50;
    $endCursor = '';
    $hasNextPage = true;
    $totalLikeCount = 0;
    $totalCommentCount = 0;
    $userName = '';
    $mediaCount = 0;

    while ($hasNextPage) {

        $variables = urlencode(json_encode([
            "id" => $userId,
            "after" => $endCursor,
            "first" => $maxPerPage,
        ]));

        $response = Http::withHeaders($headers)->get('https://www.instagram.com/graphql/query/?query_hash=e769aa130647d2354c40ea6a439bfc08&variables=' . $variables);


        $edges = $response->json()['data']['user']['edge_owner_to_timeline_media'];
        $totalOfEdges = count($edges['edges']);

        for ($i = 0; $i < $totalOfEdges; $i++) {
            $totalCommentCount += $edges['edges'][$i]['node']['edge_media_to_comment']['count'];
            $totalLikeCount += $edges['edges'][$i]['node']['edge_media_preview_like']['count'];
            $userName = $edges['edges'][$i]['node']['owner']['username'];
        }

        $endCursor = $edges['page_info']['end_cursor'];
        $hasNextPage = $edges['page_info']['has_next_page'];
        $mediaCount = $edges['count'];
    }

    echo $followersCount . '<br>';
    echo $totalLikeCount . '<br>';
    echo $totalCommentCount . '<br>';
    echo ($totalLikeCount + $totalCommentCount) . '<br>';
    echo $userName . '<br>';
    echo $userId . '<br>';
    echo $mediaCount . '<br>';

    $percent = ($totalLikeCount + $totalCommentCount) / $followersCount * 100;
    echo "engagementRate: " . number_format($percent, 2, ',', '.') . '%';
rjsandim
  • 39
  • 5
  • nice idea thanks but it didn't work. Maybe I must use yield method – Yusuf Doğan May 23 '22 at 23:05
  • I don't think you have a memory issue, because you are not creating a lot of objects, you are aggregating data. I think you have a performance problem because you are making requests for an API a lot. Create a global counter to see how many times you call Instagram API and create a timer that starts before call the API and stop after returning and keep sum of the time. You will probably get the bottlenecks of you algorithm. I use to track this : `$start = microtime(true);` , and Call the API and then, `$end = microtime(true); $executionTime = number_format($end - $start, 2);` – rjsandim May 24 '22 at 04:45