1

I have an API call from my application to my other application via cURL, passing POST variables, like so:

$curl = curl_init();
curl_setopt_array($curl, array(
    CURLOPT_URL => $url,
    CURLOPT_POST => 1,
    CURLOPT_POSTFIELDS => $paramString,
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_TIMEOUT => 600,
    CURLOPT_CONNECTTIMEOUT => 60,
));

In this $url application I try to fetch 20mln of records from one table and then pass it via JSON, like that:

public function apiMethod()
{
   $response = $this -> _db -> fetchAll('SELECT t.hash FROM table t WHERE id BETWEEN 20000000 AND 40000000;');
   echo json_encode($response);
}

Using PDO of course.

Ok, there are two problems with that: 1. FetchAll() doesn't work for so many records - memory exceeds. 2. Json doesn't work for so many records - there's max size for json.

I've been thinking about calling cURL many times and fetching every time, let's say, 100,000 records instead of fetching all. Is this the only way? What's the best way to do that?

khernik
  • 2,059
  • 2
  • 26
  • 51
  • I think you have two (or many more) options, make more smaller calls, or usually you could think about rsync via ssh (or sftp) – swidmann Sep 17 '15 at 15:06
  • Tell more abour the architecture and why you want to create this. My first quest is data synchronization. i understand you have two separated applications but do you also have two separated database servers? if you have two separated databases and need data synchronization you need to use database replication or FEDERATED storage engine (https://dev.mysql.com/doc/refman/5.1/en/federated-storage-engine.html) – Raymond Nijland Sep 17 '15 at 15:30

2 Answers2

0

Your main problem is architecture.

The best way apparently is not to have an API that requires transfer of zillions rows on every call.

Either implement a method to retrieve only one row, which is suitable for API, or reconsider whole architecture. Say, disk replication or database replication or such.

Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
0

You definitely should not use fetchAll, since it just fills your memory. Are you sure you need a full data transfer every time? Often one just needs to transfer differences. That of course makes your API much more complex.

You either must implement a stable connection and push your data every tenthousand rows or you could prepare a file (also every tenthousand rows) by cronjob and transfer this file with a method like filetransfer

If you write a file, you could "fake" the json-Array-Part "[" and "]" and just concat all your rows.

Are you sure json is the right format? If you have just one column there is not much structure really.

Community
  • 1
  • 1
flaschenpost
  • 2,205
  • 1
  • 14
  • 29