1

I am working with api ( using curl and php ) that is supposed to return a batch of csv records based on a date. First, you make a request that returns the number of records that will be in the batch. The second request is to actually retrieve the records. My problem is when I make the second request I do NOT get the number of records that the first request indicates I am to get. Regardless of the date selected I ALWAYS get 7167 records ( the number of records for any particular day will be over 8000 ). The content length I received is what the header states I am to get. I do not get any errors. Here is the rub, if I put the url request in a browser's address bar I get ALL the records I am supposed to get.

The script runs on a linux platform. In fact, I have tried it on another linux server with the same results. I have tried changing execution time and timeout setting. I am really baffled. Thank you in advance for your suggestions.

<?php

error_reporting(E_ALL);
ini_set('display_errors', 1);
set_time_limit(0);  // no limit

$username = "*********";
$pwd = "*********";

$urlCount = "https://*********&fromDate=". $today . "&toDate=" . $tomorrow";
$TotalPages = 0;
$CurrentPage = 0;
$PageSize = 1;
$RecordCount = 0;
$TotalRecordsProcessed = 0;

$today = date( "Y-m-d" );
$tomorrow = date( 'Y-m-d', strtotime( "+1 days" ) );

echo "today " . $today . "<br>";
echo "tomorrow: " . $tomorrow . "<br>";

$urlCount = "https://*********&fromDate=". $today . "&toDate=" . $tomorrow;

$headers = array("Content-Type:text/plain", 'Connection: Keep-Alive');

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $urlCount );// Get the count of the records we asked for
curl_setopt($ch, CURLOPT_HEADER, true );
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true );
curl_setopt($ch, CURLOPT_TIMEOUT, 6000);
curl_setopt($ch, CURLOPT_VERBOSE, true );
curl_setopt($ch, CURLOPT_USERPWD, $username . ":" . $pwd );

$result = curl_exec( $ch );                             // execute the curl function

$lines = explode( "\r\n", $result );
$idx = 0;
while( $lines[ $idx ] != null ) $idx++;     // find the empty line
$RecordCount = $lines[ $idx + 1 ];          // This is the index to the number of records in the set
$PageSize = $RecordCount;

echo "Records in set: " . $RecordCount . "<br>";

$url = "https://***********&fromDate=2015-10-19&toDate=2015-10-20&pageSize=" . $PageSize . "&pageNumber=" . $CurrentPage . "&timezone=Eastern&outputType=csv";

curl_setopt($ch, CURLOPT_URL, $url );       // set the url for getting the records

$result = curl_exec( $ch );                 // execute the function

$info = curl_getinfo($ch);

curl_close( $ch );

$size_download = $info["size_download"];
echo "Http Code: " . $info['http_code'] . "<br>";
echo "Size download: " . $info["size_download"] . "<br>";

// Make sure the length of the content received is the same as the "size_download" from the http header
$pos = strpos( $result, "UUID" ) - 2;       // find the begining of the content
$result = substr( $result , $pos, -1 );     // remove the header data...keep just the content
$len = strlen( $result );                   // get the length of the received content
if( $len != $size_download )
{
    echo "length:" . strlen( $result ) . "<br>";
    echo "Content recieved not = size_download<br>";
}

// Check to make sure records downloaded match the number of records that the first curl request says there are
// Records are in csv format

$lines = explode( "\r\n", $result );
$RecordsReceived = count( $lines );
if( $RecordsReceived != $RecordCount )
{
    echo "Record Count = " . $RecordCount . " RecordsReceived = " . $RecordsReceived . "<br>";
}

?>
Shimon
  • 11
  • 3
  • Just as a first suggestion, are you counting lines properly? It seems like `$RecordCount = count($lines)` would work better than looping through an 8000 item array looking for `null` which is not likely to be found. – miken32 Oct 23 '15 at 19:06
  • Also I'd try dumping the curl result to file and see if its contents match what you're expecting. – miken32 Oct 23 '15 at 19:09
  • Yes, the record count is accurate. There is a blank line between the header information and the content being sent. That piece of code finds that blank line. The index is then incremented so that it is at the start of the content being returned. And yes, I have dumped the results. – Shimon Oct 23 '15 at 19:13
  • Got it, I see that's the first request where you get details. Are you sure all lines end with `\r\n` in the data? – miken32 Oct 23 '15 at 19:17
  • Yes. I have taken the content ( before doing the explode ) and written it out to a file and then imported into excel and the record count in both is the same. All records have the correct number of fields. There are not any merged records. – Shimon Oct 23 '15 at 19:29
  • I could be wrong, but: you say that the second request works in the browser, but not when cURL is reusing the connection (2nd request). Maybe it is a header problem, due to the connection reusage or the API wants more header fields or different ones (text/plain - text/csv, keep-alive, user-agent). Is the CSV really a get paramter and not content negotiated? I would suggest to compare the request headers of browser and curl for the 2nd URL. You might also add some more curl debugging output with curl_getinfo() and CURLOPT_STDERR (referencing http://stackoverflow.com/a/14436877/1163786 ). – Jens A. Koch Oct 23 '15 at 19:46
  • I have tried multipart/form-data, text/plain and text/csv all with the same results. The second url request actually allows you to break up the records into smaller batches. In other words I can do a request for say 500 records at a time and make the request multiple times until I get all the records. So, if I have 8000 records I would make 16 calls. When I get to 7167 ( or the 15th call ) I no longer get any records! In every previous call ( 1 - 14 ) I get the correct number of records EVERY time. So recursive calls do not seem to be the issue. I will try some more debugging with getinfo. – Shimon Oct 23 '15 at 20:08

0 Answers0