2

I've made a code search application that interacts with GitHubs API, that i want to add pagination to, pagination data is held in the header like so:

Link: <https://api.github.com/user/repos?page=3&per_page=100>; rel="next", <https://api.github.com/user/repos?page=50&per_page=100>; rel="last"

My code:

    // API CONNECTION
    $url = 'https://api.github.com/search/code?q=' . $term  . '+language:' . $lang . '&per_page=' . $pp;
    $cInit = curl_init();
    curl_setopt($cInit, CURLOPT_URL, $url);
    curl_setopt($cInit, CURLOPT_RETURNTRANSFER, 1); // 1 = TRUE
    curl_setopt($cInit, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); 
    curl_setopt($cInit, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
    curl_setopt($cInit, CURLOPT_USERPWD, $user . ':' . $pwd);
    curl_setopt($cInit, CURLOPT_HTTPHEADER, array('Accept: application/vnd.github.v3.text-match+json')); // ADD THE HIGHLIGHTED CODE SECTION

    // MAKE CURL OUTPUT READABLE
    $output = curl_exec($cInit);
    $items = json_decode($output, true); 
    curl_close($cInit); // CLOSE OUR API CONNECTION

Now, i've added in curl_setopt($cInit, CURLOPT_HEADER, true);

And now, for whatever reason - when i do var_dump($items) which worked before i added CURLOPT_HEADER to my code - instead returns a NULL. Which in turn breaks the entire project.

Doing some debugging i found that var_dump($output) is still outputting data, and as expected has the header included. However, the Link Header looks like this:

Link: ; rel="next", ; rel="last" When it shouldnt. To my knowledge, it looks like the link header has actually broken my code.

I've tried various things like trying to urlencode $output before i decode it, but to no avail. So, how do i fix this?

Krono
  • 1,352
  • 1
  • 14
  • 33
  • @StefanAvramovic Like i said, i would like to get stared with paginating the data, the data i need - is inside the header, so i need to output it so i can use it. But its not outputting correctly. – Krono Feb 09 '18 at 07:09
  • Check out this post: https://stackoverflow.com/questions/9183178/can-php-curl-retrieve-response-headers-and-body-in-a-single-request – Stefan Avramovic Feb 09 '18 at 08:33

3 Answers3

3

Setting curl_setopt($cInit, CURLOPT_HEADER, true); (or 1 instead of true) means that instead of just getting the body back, the $output variable also includes the headers. This is why trying to json_decode() it doesn't work - with the headers at the top, it's no longer a valid JSON string.

This SO question has more details on the various ways you can try and parse out the headers from your body, depending on the needs of your server. If you're not using proxies, redirects or anything odd, then the accepted answer from that question may work for you (adapted for your variables):

$header_size = curl_getinfo($cInit, CURLINFO_HEADER_SIZE);
$header = substr($output, 0, $header_size);
$body = substr($output, $header_size);

If you're concerned that because you're dealing with Github and you don't know about their infrastructure or what they might change on you (the Github search documentation does warn it may change without advance notice after all), then you may be better off using the CURLOPT_HEADERFUNCTION option, which lets you assign a callback function to parse each (every) header that comes back from the cURL request. What the value of this must be (from the documentation):

A callback accepting two parameters. The first is the cURL resource, the second is a string with the header data to be written. The header data must be written by this callback. Return the number of bytes written.

You can see examples of this in the same previous SO question - it can be the usual trivial cases (a named function, or a PHP callable array), or even a closure which populates a global $headers array.

Having tested these methods, the Link header showed up correctly for me if there was more than one page of results. If there was only one page (or no results) then the Link header was omitted from the Github response entirely.

Without knowing what you're using for $term, $lang and $pp, this might be a bit trickier. Since you're also using a $user and $pw combo for authorization, there might be some differences in using the regular API endpoints for publicly consumable data. I would check using search queries where you know that there are many pages of results on public repositories first.

Last but not least, if you're writing an application to consume the Github API, I suggest standing the shoulders of those who have been there before. For example, KNP Labs have a Github API wrapper for PHP which is very popular (with documentation on search and pagination), or if you're using Laravel there's a wrapper by Graham Campbell.

Leith
  • 3,139
  • 1
  • 27
  • 38
  • I'm aware of the libraries, but for this project i was instructed to do so from scratch for learning purposes. – Krono Feb 11 '18 at 02:53
  • i've accepted your answer. i used `curlopt_headerfunction` and got it working. It's time for me to figure out my next step. – Krono Feb 14 '18 at 01:11
0

However, the Link Header looks like this:

Link: ; rel="next", ; rel="last"

That sounds like you're looking at the output with your browser, and it just interpreted the link URL between < and > as text.

Use var_dump instead of echo for debugging. Alternatively, simple use the browser's source view.

cweiske
  • 30,033
  • 14
  • 133
  • 194
  • I am using var_dump. Like in my post `var_dump($output)`, yes i am indeed, this is a webpage based application, not a CLI application. I thought it had something to do with `<` `>` is there a way around this at all? – Krono Feb 09 '18 at 14:27
  • I also had problems probably with the `<` `>` characters, looking at the source code solved it for me. @Krono `var_dump` did not work for me either. – bene-we Feb 07 '21 at 10:04
0

I had the same issue with pagination in the Shopify API; the headers worked until the Link element, which only returned a 6 character string (<https). These two lines of code worked for me (note that I used htmlspecialchars() on the second one so that I could view it in my browser using var_dump($header);

$header_size = curl_getinfo($cInit, CURLINFO_HEADER_SIZE);
$header = htmlspecialchars(substr($output, 0, $header_size));
AndyG
  • 9
  • 4